Normalization Benchmarks¶
Note
This page will be populated with benchmark data from nightly CI runs.
LayerNorm¶
| Shape (B, S, D) | Dtype | TileOPs (ms) | PyTorch (ms) | Speedup | Bandwidth (GB/s) |
|---|---|---|---|---|---|
| — | — | — | — | — | — |
RmsNorm¶
| Shape (B, S, D) | Dtype | TileOPs (ms) | PyTorch (ms) | Speedup | Bandwidth (GB/s) |
|---|---|---|---|---|---|
| — | — | — | — | — | — |
Fused Add + LayerNorm¶
| Shape | Dtype | TileOPs (ms) | Separate Ops (ms) | Speedup | Bandwidth (GB/s) |
|---|---|---|---|---|---|
| — | — | — | — | — | — |