I think of it less as needing human data and more as needing 'grounded' data. ...

npub15sa…8xgu

2025-01-22 16:49:36

in reply to nevent1q…tl6e

I think of it less as needing human data and more as needing 'grounded' data. 'Grounded' as in, makes sense because it's made contact with base reality and therefore isn't totally made up.

For example, an LLM given a many-step arithmetic problem could spout endless bullshit and claim it has the right answer at the end of it.
But give the same LLM access to a calculator and it has a 'grounding' mechanism by which it can maintain contact with reality throughout the completion of its task.
This example is at inference-time, but the same principle applies to training time.

All of the latest reasoning models (o1, o3, r1) are using this principle to get infinite grounded data. Without the need for humans to generate it. For an intuition: https://arxiv.org/abs/2203.14465
(They use RL to actually improve their scores on that data, which is where most of the attention is right now, but they still need the data)

So I think we'll continue to see immense gains in all domains that have grounding mechanisms. Like
-Math: calculators and math languages like Lean
-Coding: code interpreters/compilers
-Logic: languages like Prolog
-Etc.

Some domains are hard to come up with humanless grounding systems for, like creative writing.

This same principle is how AlphaGo, AlphaCode, AlphaStar etc became superhuman. They had an initial phase of training on human data, but that only got them to near human level. The second phase was just the model generating infinite grounded data by interacting with their respective grounding mechanisms, by which they became superhuman.

Also happens to be why I'm currently at like 50% we'll get 'AGI' in the next 5-10 years

#AI #RL #LLM

Author Public Key

npub15saszx8awfyj72apz2gveje8gx93lkahecapytfzjszw27n4jadq4e8xgu

Show more details

Published at

2025-01-22 16:49:36

Kind type

1 Short Text Note

Event JSON

{ "id": "62823a04afb46f75f120fcd77e1c0982fa4c978f19d420eb629526fa51e73059", "pubkey": "a43b0118fd72492f2ba11290cccb27418b1fdbb7ce3a122d229404e57a75975a", "created_at": 1737564576, "kind": 1, "tags": [ [ "e", "3604547082532ed3c0d5c9985bf3031d8319763d27fa62aa301b6be62dfc0f1a", "", "root" ], [ "p", "0a69cf2560597cd4dfff9a75f40261d902a91b139cdacea10d54a52b43219250" ], [ "t", "AI" ], [ "t", "ai" ], [ "t", "RL" ], [ "t", "rl" ], [ "t", "LLM" ], [ "t", "llm" ], [ "r", "https://arxiv.org/abs/2203.14465" ] ], "content": "I think of it less as needing human data and more as needing 'grounded' data. 'Grounded' as in, makes sense because it's made contact with base reality and therefore isn't totally made up.\n\nFor example, an LLM given a many-step arithmetic problem could spout endless bullshit and claim it has the right answer at the end of it. \nBut give the same LLM access to a calculator and it has a 'grounding' mechanism by which it can maintain contact with reality throughout the completion of its task.\nThis example is at inference-time, but the same principle applies to training time.\n\nAll of the latest reasoning models (o1, o3, r1) are using this principle to get infinite grounded data. Without the need for humans to generate it. For an intuition: https://arxiv.org/abs/2203.14465\n(They use RL to actually improve their scores on that data, which is where most of the attention is right now, but they still need the data)\n\nSo I think we'll continue to see immense gains in all domains that have grounding mechanisms. Like \n-Math: calculators and math languages like Lean\n-Coding: code interpreters/compilers\n-Logic: languages like Prolog\n-Etc. \n\nSome domains are hard to come up with humanless grounding systems for, like creative writing.\n\nThis same principle is how AlphaGo, AlphaCode, AlphaStar etc became superhuman. They had an initial phase of training on human data, but that only got them to near human level. The second phase was just the model generating infinite grounded data by interacting with their respective grounding mechanisms, by which they became superhuman.\n\nAlso happens to be why I'm currently at like 50% we'll get 'AGI' in the next 5-10 years\n\n#AI #RL #LLM", "sig": "feab307d9aca6051125b95172ea7b6ed5e4f2d07cc289264b373d7fdc790442b81843c8265500943acec917c2f074e2e299584fa3acd8fc5acc04841a42edd77" }