What is Nostr?
Yohan John 🤖🧠 /
npub1x2c…jjtg
2024-02-26 22:13:22

Yohan John 🤖🧠 on Nostr: "... the attention pattern of a single layer can be ``nearly randomized'', while ...

"... the attention pattern of a single layer can be ``nearly randomized'', while preserving the functionality of the network. We also show via extensive experiments that these constructions are not merely a theoretical artifact: even after severely constraining the architecture of the model, vastly different solutions can be reached via standard training."

https://arxiv.org/abs/2312.01429
Author Public Key
npub1x2cj24s4axzg70nkx995vrf8ltdfd7gyy6gvvt4dhc6uuapzw0hst2jjtg