Capacity factor
A multiplier that sets the maximum number of tokens each expert can process per batch: `capacity = (batch_tokens / n_experts) × capacity_factor`.
A multiplier that sets the maximum number of tokens each expert can process per batch: capacity = (batch_tokens / n_experts) × capacity_factor. A factor of 1.0 means each expert gets exactly its fair share; 2.0 gives each expert twice as much buffer. Tokens exceeding an expert's capacity are dropped. Higher capacity factor reduces dropping but increases memory cost.