MachuPikacchu on Nostr: Something seemingly overlooked in all the Deepseek talk is that Google released a ...
Something seemingly overlooked in all the Deepseek talk is that Google released a successor to the transformer architecture recently [1].
For anyone who doesn’t know, virtually all of the frontier AI models are based on a transformer architecture that uses something called an attention mechanism. This attention helps the model accurately pick out relevant tokens in the input sequence when predicting the output sequence.
The attention mechanism updates an internal “hidden” memory (a set of 3 learned vectors called query, key, and values respectively) when trained but once training is complete the model remains static. This means that unless you bolt on some type of external memory in your workflow (e.g. store the inputs and outputs in a vector database and have your LLM query it in a RAG setup) your model is limited by what it has already been trained on.
What this new architecture proposes is to add a long term memory module that can be updated and queried at inference time. You add another neural network into the model that’s specifically trained to update and query the long term memory store and train that as part of regular training.
Where this seems to be heading is that the leading AI labs can release open weight models that are good at learning but to really benefit from them you’ll need a lot of inference time data and compute which very few people have. It’s another centralizing force in AI.
1. https://arxiv.org/abs/2501.00663
For anyone who doesn’t know, virtually all of the frontier AI models are based on a transformer architecture that uses something called an attention mechanism. This attention helps the model accurately pick out relevant tokens in the input sequence when predicting the output sequence.
The attention mechanism updates an internal “hidden” memory (a set of 3 learned vectors called query, key, and values respectively) when trained but once training is complete the model remains static. This means that unless you bolt on some type of external memory in your workflow (e.g. store the inputs and outputs in a vector database and have your LLM query it in a RAG setup) your model is limited by what it has already been trained on.
What this new architecture proposes is to add a long term memory module that can be updated and queried at inference time. You add another neural network into the model that’s specifically trained to update and query the long term memory store and train that as part of regular training.
Where this seems to be heading is that the leading AI labs can release open weight models that are good at learning but to really benefit from them you’ll need a lot of inference time data and compute which very few people have. It’s another centralizing force in AI.
1. https://arxiv.org/abs/2501.00663