Back It Off a Quarter Turn

A friend actually educated in the workings of LLMs described some aspects of LLM training. A sequence of train-then-verify loops produces models of varying quality. The trainers look for increasing performance, and eventually see it level off or start to decline with overtraining. Then they select the model that exhibited the peak performance, generally not the last.

So the old engineering joke lives on!

How hard should you torque a bolt?

Tighten it until it strips, then back it off a quarter turn.