Cross-Entropy Loss
The standard loss function for language modeling.
The standard loss function for language modeling. Measures how well the model's predicted probability distribution matches the true distribution (i.e., the correct next token). Lower loss = better model.
The standard metric for language models. Measures how well the model's predicted probability distribution matches the true distribution. Lower loss = better model. Ranges from 0 (perfect) to infinity (terrible).