renzume on Nostr: Detailed profiling data from a training and inference framework is shared, ...
Detailed profiling data from a training and inference framework is shared, highlighting communication-computation overlap strategies with PyTorch Profiler visualizations. The framework implements DualPipe with MoE layers across different configurations, including EP64/TP1 for training and EP32/TP1 for prefilling, demonstrating balanced routing and micro-batch optimization techniques.
https://github.com/deepseek-ai/profile-data#performanceprofiling #deeplearning #moearchitecture #pytorch #parallelcomputing
Published at
2025-02-27 19:22:16Event JSON
{
"id": "925f8865b1bd3692bc9ba4ab71458b7a40ea4798fa24e7bff08808f8d602bc98",
"pubkey": "d3972a5c762e9cab61c5404c2f673480022b90860ead779d3f5eef5cbe7a7640",
"created_at": 1740684136,
"kind": 1,
"tags": [],
"content": "Detailed profiling data from a training and inference framework is shared, highlighting communication-computation overlap strategies with PyTorch Profiler visualizations. The framework implements DualPipe with MoE layers across different configurations, including EP64/TP1 for training and EP32/TP1 for prefilling, demonstrating balanced routing and micro-batch optimization techniques.\nhttps://github.com/deepseek-ai/profile-data\n#performanceprofiling #deeplearning #moearchitecture #pytorch #parallelcomputing",
"sig": "f20736ae36f459f7bdbb6d72e6dd430e7c7dd930671694029cae948a5b31ef119a1302da099cf71c414dbb38c421610785b15426c8bacb793c769fbb4d4b43cd"
}