I remember seeing papers about reducing LLM memory usage by pruning or compressing ...

2025-01-27 21:09:37

I remember seeing papers about reducing LLM memory usage by pruning or compressing less important parameters. Sure accuracy goes down a little but not by much.

I’ve also heard a lot of talk about creating a bunch of expert “mini” AIs that an orchestrator AI would invoke to complete a user’s request.

The only thing new to me is the “multi-token” approach.

Looks like the smaller startup was able to pivot to the newer ideas faster than the big guys. Should be interesting to see what OpenAI and the others do now that this has dropped on the open source world.

Author Public Key

npub1u5cgh8qya3wv9whqfdd8pf760xeg2qdqvakzgvm4syltpnu0fsyqmmppcy

Show more details

<(0_0<) on Nostr: I remember seeing papers about reducing LLM memory usage by pruning or compressing ...