Sequence Parallelism
Parallelising the sequence dimension of tensors.
Parallelising the sequence dimension of tensors. Ring Attention is one implementation. Complements data parallelism (batches), tensor parallelism (model weights), and pipeline parallelism (layers) in modern distributed training.