What is Nostr?
José A. Alonso /
npub1pma…v8pw
2025-01-16 08:09:03

José A. Alonso on Nostr: U-MATH: A university-level benchmark for evaluating mathematical skills in LLMs. ~ ...

U-MATH: A university-level benchmark for evaluating mathematical skills in LLMs. ~ Konstantin Chernyshev, Vitaliy Polshkov, Ekaterina Artemova, Alex Myasnikov, Vlad Stepanov, Alexei Miasnikov, Sergei Tilga. https://arxiv.org/abs/2412.03205 #LLMs #Math
Author Public Key
npub1pmahhjgr7nr8zmx56purp56y6747tds859tdu0x7rtq6t0ez4cwqfnv8pw