What is Nostr?
Dragon-sided D /
npub1sea…pev4
2024-09-13 05:02:00

Dragon-sided D on Nostr: “[T]he next model update performs similarly to PhD students on challenging ...

“[T]he next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions.”

https://openai.com/index/learning-to-reason-with-llms/
Author Public Key
npub1sea3d77mjcwnvg9kumqa6zcpy3crcndnr4khzzttseh9fhkfx4sqxnpev4