Brandon Rohrer on Nostr: This is the open secret of reinforcement learning. Sure, there are methods that can ...
This is the open secret of reinforcement learning. Sure, there are methods that can optimize against arbitrary reward functions, but the process of choosing a reward function to get the behavior you want is the darkest of arts.
Published at
2024-06-25 15:16:32Event JSON
{
"id": "b518c52e2015df59a5f7f41b7e1026e90a082931ef78c3f69ceb51189cd32f35",
"pubkey": "95ea081a627cee44e532825986ecc662139d068c4bdacbe820a8f445b9c6c06b",
"created_at": 1719328592,
"kind": 1,
"tags": [
[
"p",
"6f9534fa269f1eaee951aa573bb4a5887fe94136e6ee13b3e0dac087d2e2d186"
],
[
"proxy",
"https://recsys.social/@brohrer/112677918634928422",
"web"
],
[
"e",
"e9748481d124a6f0c51fd500867fd90419f7928535b0e41167d90e3234891a3f",
"",
"root"
],
[
"proxy",
"https://recsys.social/users/brohrer/statuses/112677918634928422",
"activitypub"
],
[
"L",
"pink.momostr"
],
[
"l",
"pink.momostr.activitypub:https://recsys.social/users/brohrer/statuses/112677918634928422",
"pink.momostr"
],
[
"expiration",
"1721920596"
]
],
"content": "This is the open secret of reinforcement learning. Sure, there are methods that can optimize against arbitrary reward functions, but the process of choosing a reward function to get the behavior you want is the darkest of arts.",
"sig": "4cb15a1b6b515eeecfbf8547e5c02431ec7eaab51c7f4f5a671afeb682d9c9ed825da2a1d50379b6fc47df4643715cc177667eaa29e79499986c8228a06e8a17"
}