Nazo on Nostr: nprofile1q…mvvfe I presume this means in the training itself? According to RULER it ...
nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpqsu4t3c2h2fl2ws6hfcnn5r7jdn87qvhvaafe5uv3hcwygxz3rjss4mvvfe (nprofile…vvfe) I presume this means in the training itself? According to RULER it seems even most 128K context models are 64K at best. So they've also been taking shortcuts as a way to get that cost down...
I wonder if they really even need that much the way most people are using them. Most stuff is pretty close to one-shot. Kind of wasteful really in those cases.
I wonder if they really even need that much the way most people are using them. Most stuff is pretty close to one-shot. Kind of wasteful really in those cases.