Skip to content

Attention Benchmarks

Performance results for attention operators.

Note

This page will be populated with benchmark data from nightly CI runs.

Flash Attention Forward

Shape (B, H, S, D) Dtype TileOPs (ms) PyTorch (ms) Speedup TFLOPS

Flash Attention Decode

Shape (B, H, S, D) Dtype TileOPs (ms) Baseline (ms) Speedup Bandwidth (GB/s)

GQA Forward

Shape Dtype TileOPs (ms) PyTorch (ms) Speedup TFLOPS