What is Nostr?
papaslag
npub1py0…7ryh
2025-01-21 02:56:12
in reply to nevent1q…g0gc

papaslag on Nostr: Have you looked at the training methods used by deepseek for their recent model? It ...

Have you looked at the training methods used by deepseek for their recent model? It splits the training up via horizontally per node with this DualPipe. From link (https://adasci.org/deepseek-v3-explained-optimizing-efficiency-and-scale/), might work at more distributed scale as it needs close to zero all-to-all communication
Author Public Key
npub1py0w8ymhdsll4tksvk69u0ethqdp32k4cdcmwhlaz8sfv6p8as8sc77ryh