npub15s…e8xgu on Nostr: I think of it less as needing human data and more as needing 'grounded' data. ...
I think of it less as needing human data and more as needing 'grounded' data. 'Grounded' as in, makes sense because it's made contact with base reality and therefore isn't totally made up.
For example, an LLM given a many-step arithmetic problem could spout endless bullshit and claim it has the right answer at the end of it.
But give the same LLM access to a calculator and it has a 'grounding' mechanism by which it can maintain contact with reality throughout the completion of its task.
This example is at inference-time, but the same principle applies to training time.
All of the latest reasoning models (o1, o3, r1) are using this principle to get infinite grounded data. Without the need for humans to generate it. For an intuition: https://arxiv.org/abs/2203.14465
(They use RL to actually improve their scores on that data, which is where most of the attention is right now, but they still need the data)
So I think we'll continue to see immense gains in all domains that have grounding mechanisms. Like
-Math: calculators and math languages like Lean
-Coding: code interpreters/compilers
-Logic: languages like Prolog
-Etc.
Some domains are hard to come up with humanless grounding systems for, like creative writing.
This same principle is how AlphaGo, AlphaCode, AlphaStar etc became superhuman. They had an initial phase of training on human data, but that only got them to near human level. The second phase was just the model generating infinite grounded data by interacting with their respective grounding mechanisms, by which they became superhuman.
Also happens to be why I'm currently at like 50% we'll get 'AGI' in the next 5-10 years
#AI #RL #LLM
For example, an LLM given a many-step arithmetic problem could spout endless bullshit and claim it has the right answer at the end of it.
But give the same LLM access to a calculator and it has a 'grounding' mechanism by which it can maintain contact with reality throughout the completion of its task.
This example is at inference-time, but the same principle applies to training time.
All of the latest reasoning models (o1, o3, r1) are using this principle to get infinite grounded data. Without the need for humans to generate it. For an intuition: https://arxiv.org/abs/2203.14465
(They use RL to actually improve their scores on that data, which is where most of the attention is right now, but they still need the data)
So I think we'll continue to see immense gains in all domains that have grounding mechanisms. Like
-Math: calculators and math languages like Lean
-Coding: code interpreters/compilers
-Logic: languages like Prolog
-Etc.
Some domains are hard to come up with humanless grounding systems for, like creative writing.
This same principle is how AlphaGo, AlphaCode, AlphaStar etc became superhuman. They had an initial phase of training on human data, but that only got them to near human level. The second phase was just the model generating infinite grounded data by interacting with their respective grounding mechanisms, by which they became superhuman.
Also happens to be why I'm currently at like 50% we'll get 'AGI' in the next 5-10 years
#AI #RL #LLM