Attention Operators¶

Supported architectures: Ampere (SM 80/86), Hopper (SM 90)
Supported dtypes: float16 , bfloat16

Flash Attention¶

Multi-head attention forward pass using Flash Attention algorithm.

from tileops.ops import MultiHeadAttentionFwdOp

op = MultiHeadAttentionFwdOp(dtype=torch.float16)
output = op.forward(Q, K, V)

Multi-head attention backward pass.

Optimized decode-phase attention with KV-cache support.

Decode attention with paged KV-cache for memory-efficient serving.

DeepSeek MLA (Multi-head Latent Attention) decode.

DeepSeek DSA (Dynamic Sparse Attention) decode.