Andrej Karpathy / @karpathy (RSS Feed) on Nostr: **RT @natfriedman:** Watching llama.cpp do 40 tok/s inference of the 7B model on my ...
**RT @natfriedman:**
Watching llama.cpp do 40 tok/s inference of the 7B model on my M2 Max, with 0% CPU usage, and using all 38 GPU cores.
Congratulations @ggerganov (https://nitter.moomoo.me/ggerganov) ! This is a triumph.
github.com/ggerganov/llama.c… (https://github.com/ggerganov/llama.cpp/pull/1642)
https://nitter.moomoo.me/natfriedman/status/1665402680376987648#m
Watching llama.cpp do 40 tok/s inference of the 7B model on my M2 Max, with 0% CPU usage, and using all 38 GPU cores.
Congratulations @ggerganov (https://nitter.moomoo.me/ggerganov) ! This is a triumph.
github.com/ggerganov/llama.c… (https://github.com/ggerganov/llama.cpp/pull/1642)
https://nitter.moomoo.me/natfriedman/status/1665402680376987648#m