Communication Complexity
The amount of data that must be transferred between GPUs.
The amount of data that must be transferred between GPUs. In Ring Attention, each KV chunk circulates P times, resulting in O(n × d) total data per GPU. Well-balanced with compute time if GPUs have sufficient throughput.