renzume on Nostr: DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with ...
DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with fine-grained scaling, supporting both normal and Mix-of-Experts GEMMs. The lightweight library matches or exceeds performance of expert-tuned libraries, featuring runtime compilation and Hopper tensor core optimization, while maintaining a simple ~300-line core kernel.
https://github.com/deepseek-ai/DeepGEMM#cuda #gpucomputing #matrixoperations #performanceoptimization #deeplearning
Published at
2025-02-26 02:57:07Event JSON
{
"id": "dfe0ffe930320a8526974adb33e13571185fb7b9d458d418a61632aaa13eb1d5",
"pubkey": "d3972a5c762e9cab61c5404c2f673480022b90860ead779d3f5eef5cbe7a7640",
"created_at": 1740538627,
"kind": 1,
"tags": [],
"content": "DeepGEMM is a CUDA library offering efficient FP8 matrix multiplications with fine-grained scaling, supporting both normal and Mix-of-Experts GEMMs. The lightweight library matches or exceeds performance of expert-tuned libraries, featuring runtime compilation and Hopper tensor core optimization, while maintaining a simple ~300-line core kernel.\nhttps://github.com/deepseek-ai/DeepGEMM\n#cuda #gpucomputing #matrixoperations #performanceoptimization #deeplearning",
"sig": "19c314562549657e02192fecb6c4550918276e54907c58361b851f00b1075532b98449188223fc1c27780fb9de5741574406ddd72d94c8527e75751589b9f7fc"
}