Question: People are often looking at number of trainable parameters (weights) as a ...

2023-11-23 05:17:24

Question: People are often looking at number of trainable parameters (weights) as a proxy for llm model complexity (and memory requirements).

Isn't the number of attention heads also an important parameter to consider in evaluating language models?

Author Public Key

npub1m2mvvpjugwdehtaskrcl7ksvdqnnhnjur9v6g9v266nss504q7mqvlr8p9

Show more details

juraj on Nostr: Question: People are often looking at number of trainable parameters (weights) as a ...