cathiedwood on Nostr: If successful, this incredible ramp in compute will increase the odds that #Tesla ...
If successful, this incredible ramp in compute will increase the odds that #Tesla will be the first to roll out a nationwide autonomous taxi platform, which we believe is a winner-take-most #AI-driven opportunity with SaaS-like margins.
Quoting @downingARK:
Tesla forecasts ramping up to 100 exaflops of AI training capacity by Q4 2024. That's a huge number that implies an over 20x scaling from the 4.6 exaflops disclosed at AI Day 2022 (14k Nvidia A100 GPUs), and over 50x the A100 capacity discussed in 2021.
This implies a compound annual growth in training capacity of 273% from 2021 to 2024 (if they hit their target), based on the 5,760 A100s they disclosed at CVPR 21.
Here are some additional thoughts / read throughs:
1. At AI Day 2022, Tesla projected to have their first Dojo exapod in production by Q1 23, scaling up to 7 exapods in Palo Alto over time. The July 23 production date on the chart, assuming they're not ignoring an already in-production pod, implies Tesla is a bit behind on exapod #1, but plans a faster, larger ramp to ~28 exapods by Q1 24 to reach capability of 100k A100s.
2. If the new capacity is all Dojo, and not Dojo + Nvidia mixed), 300k A100 equivalent performance is just under some estimates of Nvidia's total A100 SXM shipments over the last 12 months (NextPlatform estimates 350k server GPU sales of the variety Tesla is augmenting / replacing with Dojo, I'm assuming most of these were A100s as H100 started ramping recently)
3. The chart appears to be using flops as the measure of training compute. In reality, a lot more goes into end-use performance than just flops. At AI Day 2022, Tesla estimated that 4 Dojo cabinets (0.4 exaflops) could replace 4,000 A100s (1.2 exaflops) for autolabelling. This is made possible because of the increased compute density and software optimizations Tesla expected to achieve. So either they haven't been able to achieve the optimizations they thought they could, or 2024 training capability could be even higher than this chart suggests (in terms of A100 equivalent capability).
Additional assumptions:
- 28 exapods by Q1 24 Tesla is exclusively adding Dojo capacity after July, and hasn't added Nvidia capacity since August. The number of exapods would be lower if they plan ramp Nvidia capacity (i.e. H100s) in parallel, especially considering H100s are 4-6x more performant than A100s at AI training, per Nvidia.
Sources:
Nvidia Unit Sales: nextplatform.com/2023/05/01/…
AI Day 2022: piped.video/watch?v=ODSJsviD…
CVPR 21: piped.video/watch?v=eOL_rCK5… https://piped.video/watch?v=eOL_rCK59ZI&t=29533s https://piped.video/watch?v=ODSJsviD_SU&t=8650s https://www.nextplatform.com/2023/05/01/just-how-big-are-nvidias-server-and-networking-businesses/
Quoting @downingARK:
Tesla forecasts ramping up to 100 exaflops of AI training capacity by Q4 2024. That's a huge number that implies an over 20x scaling from the 4.6 exaflops disclosed at AI Day 2022 (14k Nvidia A100 GPUs), and over 50x the A100 capacity discussed in 2021.
This implies a compound annual growth in training capacity of 273% from 2021 to 2024 (if they hit their target), based on the 5,760 A100s they disclosed at CVPR 21.
Here are some additional thoughts / read throughs:
1. At AI Day 2022, Tesla projected to have their first Dojo exapod in production by Q1 23, scaling up to 7 exapods in Palo Alto over time. The July 23 production date on the chart, assuming they're not ignoring an already in-production pod, implies Tesla is a bit behind on exapod #1, but plans a faster, larger ramp to ~28 exapods by Q1 24 to reach capability of 100k A100s.
2. If the new capacity is all Dojo, and not Dojo + Nvidia mixed), 300k A100 equivalent performance is just under some estimates of Nvidia's total A100 SXM shipments over the last 12 months (NextPlatform estimates 350k server GPU sales of the variety Tesla is augmenting / replacing with Dojo, I'm assuming most of these were A100s as H100 started ramping recently)
3. The chart appears to be using flops as the measure of training compute. In reality, a lot more goes into end-use performance than just flops. At AI Day 2022, Tesla estimated that 4 Dojo cabinets (0.4 exaflops) could replace 4,000 A100s (1.2 exaflops) for autolabelling. This is made possible because of the increased compute density and software optimizations Tesla expected to achieve. So either they haven't been able to achieve the optimizations they thought they could, or 2024 training capability could be even higher than this chart suggests (in terms of A100 equivalent capability).
Additional assumptions:
- 28 exapods by Q1 24 Tesla is exclusively adding Dojo capacity after July, and hasn't added Nvidia capacity since August. The number of exapods would be lower if they plan ramp Nvidia capacity (i.e. H100s) in parallel, especially considering H100s are 4-6x more performant than A100s at AI training, per Nvidia.
Sources:
Nvidia Unit Sales: nextplatform.com/2023/05/01/…
AI Day 2022: piped.video/watch?v=ODSJsviD…
CVPR 21: piped.video/watch?v=eOL_rCK5… https://piped.video/watch?v=eOL_rCK59ZI&t=29533s https://piped.video/watch?v=ODSJsviD_SU&t=8650s https://www.nextplatform.com/2023/05/01/just-how-big-are-nvidias-server-and-networking-businesses/