juraj on Nostr: Question: People are often looking at number of trainable parameters (weights) as a ...
Question: People are often looking at number of trainable parameters (weights) as a proxy for llm model complexity (and memory requirements).
Isn't the number of attention heads also an important parameter to consider in evaluating language models?
Published at
2023-11-23 05:17:24Event JSON
{
"id": "08acee18db52ad1c8eaf70e179e394a92730f8012f2077411a8fd8b66598d42f",
"pubkey": "dab6c6065c439b9bafb0b0f1ff5a0c68273bce5c1959a4158ad6a70851f507b6",
"created_at": 1700716644,
"kind": 1,
"tags": [
[
"client",
"Nostur"
]
],
"content": "Question: People are often looking at number of trainable parameters (weights) as a proxy for llm model complexity (and memory requirements). \n\nIsn't the number of attention heads also an important parameter to consider in evaluating language models?",
"sig": "bf204466445c660fcbdffa9b9f66ba5d94d73dad6bbfbf3de03d3163d5c175dd9f3cd6b2078e1259b77ab7d468da09bc6c10084bda380b6bf06ca8f7dc48391a"
}