José A. Alonso on Nostr: U-MATH: A university-level benchmark for evaluating mathematical skills in LLMs. ~ ...
U-MATH: A university-level benchmark for evaluating mathematical skills in LLMs. ~ Konstantin Chernyshev, Vitaliy Polshkov, Ekaterina Artemova, Alex Myasnikov, Vlad Stepanov, Alexei Miasnikov, Sergei Tilga.
https://arxiv.org/abs/2412.03205 #LLMs #Math
Published at
2025-01-16 08:09:03Event JSON
{
"id": "fc4dd04230ebcaf35f1d380295b9b22e70f356b457163d61f27484db7da75c11",
"pubkey": "0efb7bc903f4c6716cd4d07830d344d7abe5b607a156de3cde1ac1a5bf22ae1c",
"created_at": 1737014943,
"kind": 1,
"tags": [
[
"t",
"math"
],
[
"t",
"LLMs"
],
[
"proxy",
"https://mathstodon.xyz/users/Jose_A_Alonso/statuses/113837011304269124",
"activitypub"
]
],
"content": "U-MATH: A university-level benchmark for evaluating mathematical skills in LLMs. ~ Konstantin Chernyshev, Vitaliy Polshkov, Ekaterina Artemova, Alex Myasnikov, Vlad Stepanov, Alexei Miasnikov, Sergei Tilga. https://arxiv.org/abs/2412.03205 #LLMs #Math",
"sig": "2641e1c907e1cbb89366ac0aefed11f7ff882eafd90d4c719c994f7ceb871a0cbff6848bef6816831c5b32e69bca5b5bdf09b04a7f9ea6789bf89721ec5d71ea"
}