Attention Benchmarks¶
Performance results for attention operators.
Note
This page will be populated with benchmark data from nightly CI runs.
Flash Attention Forward¶
| Shape (B, H, S, D) | Dtype | TileOPs (ms) | PyTorch (ms) | Speedup | TFLOPS |
|---|---|---|---|---|---|
| — | — | — | — | — | — |
Flash Attention Decode¶
| Shape (B, H, S, D) | Dtype | TileOPs (ms) | Baseline (ms) | Speedup | Bandwidth (GB/s) |
|---|---|---|---|---|---|
| — | — | — | — | — | — |
GQA Forward¶
| Shape | Dtype | TileOPs (ms) | PyTorch (ms) | Speedup | TFLOPS |
|---|---|---|---|---|---|
| — | — | — | — | — | — |