GEMM Benchmarks¶
Performance results for matrix multiplication operators.
Note
This page will be populated with benchmark data from nightly CI runs.
Dense GEMM¶
| Shape (M, N, K) | Dtype | TileOPs (ms) | cuBLAS (ms) | Speedup | TFLOPS |
|---|---|---|---|---|---|
| — | — | — | — | — | — |
GEMV¶
| Shape (M, N) | Dtype | TileOPs (ms) | PyTorch (ms) | Speedup | Bandwidth (GB/s) |
|---|---|---|---|---|---|
| — | — | — | — | — | — |
Grouped GEMM¶
| Groups × Shape | Dtype | TileOPs (ms) | PyTorch (ms) | Speedup | TFLOPS |
|---|---|---|---|---|---|
| — | — | — | — | — | — |