Skip to content

Normalization Benchmarks

Note

This page will be populated with benchmark data from nightly CI runs.

LayerNorm

Shape (B, S, D) Dtype TileOPs (ms) PyTorch (ms) Speedup Bandwidth (GB/s)

RmsNorm

Shape (B, S, D) Dtype TileOPs (ms) PyTorch (ms) Speedup Bandwidth (GB/s)

Fused Add + LayerNorm

Shape Dtype TileOPs (ms) Separate Ops (ms) Speedup Bandwidth (GB/s)