papaslag on Nostr: Have you looked at the training methods used by deepseek for their recent model? It ...
Published at
2025-01-21 02:56:12Event JSON
{
"id": "0f0c35d563d0679ca226f6f76c943ec6dc764c01e6f4b769a967c965c3483128",
"pubkey": "091ee393776c3ffaaed065b45e3f2bb81a18aad5c371b75ffd11e0966827ec0f",
"created_at": 1737428172,
"kind": 1,
"tags": [
[
"e",
"c99cdfa47e19830c855e354f72025b0842295e8cfd835e266eb41678a710ef3b",
"wss://nos.lol/",
"root",
"091ee393776c3ffaaed065b45e3f2bb81a18aad5c371b75ffd11e0966827ec0f"
],
[
"e",
"e250dca62b4256c02825622732b45f49c275c00f6f02babe60419f128481c7f8",
"wss://relay.damus.io",
"reply"
],
[
"p",
"51910e567d6fa61d7e39298f93255acb20a7aa40b6148bcf847571eb8dbd2e36"
]
],
"content": "Have you looked at the training methods used by deepseek for their recent model? It splits the training up via horizontally per node with this DualPipe. From link (https://adasci.org/deepseek-v3-explained-optimizing-efficiency-and-scale/), might work at more distributed scale as it needs close to zero all-to-all communication \nhttps://m.primal.net/NsDu.jpg",
"sig": "9c5506fa255f192d2f0624f8066e742ad3d35b014a5c4b7dd8380b3740bc31d3795fa583d8fb8578cc2bfd0979b7e85de7eab6001f96576c54e49ac1c3312746"
}