jb55 on Nostr: These are neat, but feels like its optimizing a problem that shouldn’t exist. The ...
These are neat, but feels like its optimizing a problem that shouldn’t exist. The paper I linked dropped half of the attention layers without any noticeable impact on performance. Architecture changes like that could have a much larger impact. Wish i had time to tinker with this stuff …
Published at
2024-11-17 13:24:42Event JSON
{
"id": "59542d4159e6a845f73f864e6eb8cd34189eeb7b274d326f99d762be8a69bb57",
"pubkey": "32e1827635450ebb3c5a7d12c1f8e7b2b514439ac10a67eef3d9fd9c5c68e245",
"created_at": 1731849882,
"kind": 1,
"tags": [
[
"e",
"4e92e4c01706c138719a54464aefa28f363d651c9a4b284864af61894d3c6d4b",
"",
"root"
],
[
"e",
"0754ba81a2a557425c5f0345f4a90e85ccc45fbc2118f1533c043e1bdc17818d",
"",
"reply"
],
[
"p",
"576d23dc3db2056d208849462fee358cf9f0f3310a2c63cb6c267a4b9f5848f9"
]
],
"content": "These are neat, but feels like its optimizing a problem that shouldn’t exist. The paper I linked dropped half of the attention layers without any noticeable impact on performance. Architecture changes like that could have a much larger impact. Wish i had time to tinker with this stuff …",
"sig": "14867b6a5979128eb5e2e6a7e8aa987e13d372c215d770f03327fae0f7f785595ad1c0fd2eddbe6f3beb3255686c8831bc72a9832f695fe1da518095bc23b92e"
}