Nucleus Sampling (Top-P Sampling)
A generation strategy where you only consider the top tokens that make up a certain cumulative probability (e.g., top_p=0.9 means consider tokens until their probabilities sum to 90%).
A generation strategy where you only consider the top tokens that make up a certain cumulative probability (e.g., top_p=0.9 means consider tokens until their probabilities sum to 90%). Avoids sampling very unlikely tokens while allowing some randomness.