What is Nostr?
juraj / Juraj
npub1m2m…r8p9
2023-11-23 05:17:24

juraj on Nostr: Question: People are often looking at number of trainable parameters (weights) as a ...

Question: People are often looking at number of trainable parameters (weights) as a proxy for llm model complexity (and memory requirements).

Isn't the number of attention heads also an important parameter to consider in evaluating language models?
Author Public Key
npub1m2mvvpjugwdehtaskrcl7ksvdqnnhnjur9v6g9v266nss504q7mqvlr8p9