**RT @natfriedman:** Watching llama.cpp do 40 tok/s inference of the 7B model on my ...

2023-06-04 16:58:35

**RT @natfriedman:**

Watching llama.cpp do 40 tok/s inference of the 7B model on my M2 Max, with 0% CPU usage, and using all 38 GPU cores.

Congratulations @ggerganov (https://nitter.moomoo.me/ggerganov) ! This is a triumph.

github.com/ggerganov/llama.c… (https://github.com/ggerganov/llama.cpp/pull/1642)

Author Public Key

npub1rj7u39tvjdgfpzg3c3xfym6vzalt34p7t5uvdsqhzgst9jtl7dgqs2ffmk

Show more details

Andrej Karpathy / @karpathy (RSS Feed) on Nostr: **RT @natfriedman:** Watching llama.cpp do 40 tok/s inference of the 7B model on my ...