What is Nostr?
Dave Rahardja /
npub13js…5d7q
2024-10-15 07:42:17

Dave Rahardja on Nostr: Finally got around to reading the #Apple #LLM paper that’s been going around. ...

Finally got around to reading the #Apple #LLM paper that’s been going around. tl;dr: LLMs can’t do math because they don’t actually understand concepts; they are just really fancy autocomplete engines.

We knew that already, but this paper quantifies it. The math performance is really pretty dismal even with training that tries to optimize for math. The best performance was by OpenAI’s GPT-4o, which scored around 95% for the most basic of grade-school word problems, which means it got 1 in 20 questions wrong, which means it’s not usable for anything in production.

Adding (relevant or irrelevant) clauses or even changing proper names can cause model performance to rapidly collapse.

#ai

https://arxiv.org/pdf/2410.05229
Author Public Key
npub13jszgr40d0pnyum0t845scy8uggn676enygvaf4ajzm2y9rqzd8sy75d7q