Tim Hanson on Nostr: npub15swlx…zx855 Hmm, between Dulberg 2023 and Agent57/MEME? Re yr intuition, not ...
npub15swlxudlhx4ttcgsd4556zuqrl57qndxmt4n3dnzrkqn89nxv6lsjzx855 (npub15sw…x855) Hmm, between Dulberg 2023 and Agent57/MEME?
Re yr intuition, not really sure; you can factorize L2 loss into components easily, but this only holds in linear networks... which is likely why explicit modularization/factorization is better.
Dulberg uses Huber loss, which is a combination of L1 and L2
Re yr intuition, not really sure; you can factorize L2 loss into components easily, but this only holds in linear networks... which is likely why explicit modularization/factorization is better.
Dulberg uses Huber loss, which is a combination of L1 and L2