Yohan John 🤖🧠on Nostr: "... the attention pattern of a single layer can be ``nearly randomized'', while ...
"... the attention pattern of a single layer can be ``nearly randomized'', while preserving the functionality of the network. We also show via extensive experiments that these constructions are not merely a theoretical artifact: even after severely constraining the architecture of the model, vastly different solutions can be reached via standard training."
https://arxiv.org/abs/2312.01429Published at
2024-02-26 22:13:22Event JSON
{
"id": "d59c1447b22be127b243966471cfcff9f225ea5ef1037b6c8592ecd4bdeedcef",
"pubkey": "32b1255615e9848f3e76314b460d27fada96f9042690c62eadbe35ce742273ef",
"created_at": 1708985602,
"kind": 1,
"tags": [
[
"proxy",
"https://fediscience.org/users/DrYohanJohn/statuses/112000080467478015",
"activitypub"
]
],
"content": "\"... the attention pattern of a single layer can be ``nearly randomized'', while preserving the functionality of the network. We also show via extensive experiments that these constructions are not merely a theoretical artifact: even after severely constraining the architecture of the model, vastly different solutions can be reached via standard training.\"\n\nhttps://arxiv.org/abs/2312.01429",
"sig": "9013b262295a6c8321876df2b2db460a9f8135c62163c9e97199cffe1f2af3fb7d2a7c21b395b60bd774feb5ed4c536e5f1988f0a0bbb94da63d78442820c076"
}