What is Nostr?
renzume / renzume.
npub16wt…nfqk
2025-02-26 02:57:07

renzume on Nostr: DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with ...

DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with fine-grained scaling, supporting both normal and Mix-of-Experts GEMMs. The lightweight library matches or exceeds performance of expert-tuned libraries, featuring runtime compilation and Hopper tensor core optimization, while maintaining a simple ~300-line core kernel.
https://github.com/deepseek-ai/DeepGEMM
#cuda #gpucomputing #matrixoperations #performanceoptimization #deeplearning
Author Public Key
npub16wtj5hrk96w2kcw9gpxz7ee5sqpzhyyxp6kh08fltmh4e0n6weqq3hnfqk