News Nug

CUDA: add fast walsh-hadamard transform by am17an · Pull Request #23615 · ggml-org/llama.cpp

r/LocalLLaMA · 23h ago · 7 · inference optimization cuda open source benchmark

Discussion of FWHT (Fast Walsh-Hadamard Transform) CUDA kernel implementation for quantized KV-cache in LLM inference, with performance benchmarks across different model architectures and head sizes. Shows practical optimization work for inference speed-ups when using q8_0 quantization on different GPU architectures (RTX 5090, CDNA).