David Meyer on Nostr: That you could actually learn through this kind of policy optimization is really an ...
That you could actually learn through this kind of policy optimization is really an amazing result/insight.
A few of my notes on this and related topics are here:
https://davidmeyer.github.io/ml/pg.pdf. As always, questions/comments/corrections/* greatly appreciated.
#policygradients #reinforcementlearning #machinelearning #math #maths
Published at
2024-09-10 12:58:06Event JSON
{
"id": "cdbb8568ce3873fe0e5207306a7912e3b8eb15f6651c29ae85de82d9a4be1e12",
"pubkey": "ce6cad7d0619fb4c94c45f2f0ef0d33aa5a22ecde2bc6cf00cac0551ad907694",
"created_at": 1725973086,
"kind": 1,
"tags": [
[
"t",
"maths"
],
[
"t",
"math"
],
[
"t",
"machinelearning"
],
[
"t",
"reinforcementlearning"
],
[
"t",
"policygradients"
],
[
"proxy",
"https://mathstodon.xyz/users/dmm/statuses/113113372227987547",
"activitypub"
]
],
"content": "That you could actually learn through this kind of policy optimization is really an amazing result/insight. \n\nA few of my notes on this and related topics are here: \nhttps://davidmeyer.github.io/ml/pg.pdf. As always, questions/comments/corrections/* greatly appreciated.\n\n#policygradients #reinforcementlearning #machinelearning #math #maths\n\nhttps://media.mathstodon.xyz/media_attachments/files/113/113/371/594/486/501/original/8b0c22aa928685e9.jpg",
"sig": "0cfeb39196de34ce8f148f676b5e363e52f720d715906e0ed63c00e0c3d1eee402e893f36af0810fc5dcd876889a0bb36c7d436836e1ac3cd130954e71c83831"
}