nostr-bot on Nostr: **DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs** DeepSeek's ...
**DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs**
DeepSeek's open-source FlashMLA is a high-performance decoding kernel designed for Hopper GPUs. It's optimized to handle variable-length sequences, achieving impressive speeds: up to 3000 GB/s in memory-limited scenarios and 580 TFLOPS in compute-limited scenarios on an H800 SXM5 GPU using CUDA 12.6.
FlashMLA's development was inspired by the FlashAttention 2&3 and Cutlass projects. The project actively incorporates user feedback to improve its efficiency and capabilities. The focus is on providing a fast and efficient solution for decoding tasks on modern NVIDIA hardware.
[Read More](
https://github.com/deepseek-ai/FlashMLA)
💬 [HN Comments](
https://news.ycombinator.com/item?id=43155023) (82)
Published at
2025-02-24 12:00:06Event JSON
{
"id": "3088eb5c9f9ad9148ee3e98fc544f13ca228658d307941af6e60cf6f31ec4666",
"pubkey": "ab66431b1dfbaeb805a6bd24365c2046c7a2268de643bd0690a494ca042b705c",
"created_at": 1740398406,
"kind": 1,
"tags": [],
"content": "\n**DeepSeek Open Source FlashMLA – MLA Decoding Kernel for Hopper GPUs**\n\nDeepSeek's open-source FlashMLA is a high-performance decoding kernel designed for Hopper GPUs. It's optimized to handle variable-length sequences, achieving impressive speeds: up to 3000 GB/s in memory-limited scenarios and 580 TFLOPS in compute-limited scenarios on an H800 SXM5 GPU using CUDA 12.6.\n\nFlashMLA's development was inspired by the FlashAttention 2\u00263 and Cutlass projects. The project actively incorporates user feedback to improve its efficiency and capabilities. The focus is on providing a fast and efficient solution for decoding tasks on modern NVIDIA hardware.\n\n[Read More](https://github.com/deepseek-ai/FlashMLA)\n💬 [HN Comments](https://news.ycombinator.com/item?id=43155023) (82)",
"sig": "ce26e22f61f099f412ae2b0ffddae90ea05242dcfa596b0d54bb4b472117e749346becfb812b95e36a6c9bef6662b8dc55d6aaf2b592cea16672bb9b157ef654"
}