This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a ...

Andrej Karpathy / @karpathy (RSS Feed) /

2023-04-09 17:25:03

This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a finite state markov chain. It was trained on the sequence "111101111011110" for 50 iterations. The parameters and the architecture of the Transformer modifies the probabilities on the arrows.

E.g. we… nitter.moomoo.me/i/web/status/164… (https://nitter.moomoo.me/i/web/status/1645115622517542913)

https://nitter.moomoo.me/karpathy/status/1645115622517542913#m

Author Public Key

npub1rj7u39tvjdgfpzg3c3xfym6vzalt34p7t5uvdsqhzgst9jtl7dgqs2ffmk

Show more details

Andrej Karpathy / @karpathy (RSS Feed) on Nostr: This is a baby GPT with two tokens 0/1 and context length of 3, viewing it as a ...