Load balancing
The goal of ensuring that all n experts receive roughly equal numbers of tokens over training.
The goal of ensuring that all n experts receive roughly equal numbers of tokens over training. Perfectly balanced load means every expert trains on diverse data and develops distinct specialisation. Imbalanced load leads to expert collapse.
Ensuring all P GPUs have roughly equal work per round. Imbalanced load (some GPUs faster, some slower) causes idle time and reduces overall throughput. Critical for Ring Attention's linear speedup.