“[T]he next model update performs similarly to PhD students on challenging ...

2024-09-13 05:02:00

“[T]he next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions.”

https://openai.com/index/learning-to-reason-with-llms/

Author Public Key

npub1sea3d77mjcwnvg9kumqa6zcpy3crcndnr4khzzttseh9fhkfx4sqxnpev4

Seen on

wss://relay.nostr.band

Show more details

Dragon-sided D on Nostr: “[T]he next model update performs similarly to PhD students on challenging ...