Have you looked at the training methods used by deepseek for their recent model? It ...

2025-01-21 02:56:12

Have you looked at the training methods used by deepseek for their recent model? It splits the training up via horizontally per node with this DualPipe. From link (https://adasci.org/deepseek-v3-explained-optimizing-efficiency-and-scale/), might work at more distributed scale as it needs close to zero all-to-all communication

Author Public Key

npub1py0w8ymhdsll4tksvk69u0ethqdp32k4cdcmwhlaz8sfv6p8as8sc77ryh

Seen on

wss://relay.primal.net

Show more details

papaslag on Nostr: Have you looked at the training methods used by deepseek for their recent model? It ...