DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with ...

2025-02-26 02:57:07

DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with fine-grained scaling, supporting both normal and Mix-of-Experts GEMMs. The lightweight library matches or exceeds performance of expert-tuned libraries, featuring runtime compilation and Hopper tensor core optimization, while maintaining a simple ~300-line core kernel.
https://github.com/deepseek-ai/DeepGEMM
#cuda #gpucomputing #matrixoperations #performanceoptimization #deeplearning

Author Public Key

npub16wtj5hrk96w2kcw9gpxz7ee5sqpzhyyxp6kh08fltmh4e0n6weqq3hnfqk

Seen on

wss://relay.primal.net

Show more details

renzume on Nostr: DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with ...