Efficient Attention
Modified attention patterns (sliding window, global, sparse) that reduce computation from O(n²) to O(n log n) or O(n), making long sequences tractable.
Modified attention patterns (sliding window, global, sparse) that reduce computation from O(n²) to O(n log n) or O(n), making long sequences tractable. Gemini uses a combination for 32K-token context.