What is Nostr?
Tim Hanson /
npub10eg…r2yr
2024-01-26 19:32:13
in reply to nevent1q…gywt

Tim Hanson on Nostr: npub15swlx…zx855 Also in this ensembling vein: Agent57 and MEME use 32 ...

npub15swlxudlhx4ttcgsd4556zuqrl57qndxmt4n3dnzrkqn89nxv6lsjzx855 (npub15sw…x855)

Also in this ensembling vein: Agent57 and MEME use 32 approximations of the Q-value & policy for intrinsic and extrinsic reward. A multi-armed bandit controller is then used to select which policy to follow.

MEME is sota on Atari, slightly harder than 4-gaussian-blob world 😉
(neither was cited)

[1] http://arxiv.org/abs/2003.13350
[2] http://arxiv.org/abs/2209.07550
Author Public Key
npub10egtpxtvjwdx00htm464c6hgwmz0ngwn3kgz90rv2qeq9qqqpdcs4jr2yr