What is Nostr?
theHigherGeometer /
npub1p2q…c0lr
2024-11-08 10:42:49
in reply to nevent1q…crkr

theHigherGeometer on Nostr: npub16t62r…lhgu7 "We evaluated six leading language models on our existing subset ...

npub16t62rkttt6aduudqya89lvfallx59f4g6fltmqdhhr9jt3jw6s5q3lhgu7 (npub16t6…hgu7) "We evaluated six leading language models on our existing subset of FrontierMath problems: o1-preview (OpenAI 2024b), o1-mini (OpenAI 2024d), and GPT-4o (2024-08-06 version) (OpenAI 2024a), Claude 3.5 Sonnet (2024-10-22 version) (Anthropic 2024b), Grok 2 Beta (XAI 2024), and Google DeepMind’s Gemini 1.5 Pro 002 (GoogleAI 2024). All models had a very low performance on FrontierMath problems, with no model achieving even a 2% success rate on the full benchmark"

he he he.
Author Public Key
npub1p2q4c7sn2jgtj3w7g9syy5zjldxd8e5ruknf99g8y636ls8vx8esq5c0lr