Andrej Karpathy / @karpathy (RSS Feed) on Nostr: I wonder if von Neumann had a large d\_model, n\_layer, head\_size or block\_size, or ...
I wonder if von Neumann had a large d\_model, n\_layer, head\_size or block\_size, or kv cache. All of these hyperparams might manifest slightly different.
https://nitter.moomoo.me/karpathy/status/1642678769126350855#m
https://nitter.moomoo.me/karpathy/status/1642678769126350855#m