Dragon-sided D on Nostr: “[T]he next model update performs similarly to PhD students on challenging ...
“[T]he next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions.”
https://openai.com/index/learning-to-reason-with-llms/Published at
2024-09-13 05:02:00Event JSON
{
"id": "fb4bb3192e971e6b2016def0a7b917455f2bd7e2ddb7bfb820f7132e64d56602",
"pubkey": "867b16fbdb961d3620b6e6c1dd0b0124703c4db31d6d71096b866e54dec93560",
"created_at": 1726203720,
"kind": 1,
"tags": [
[
"proxy",
"https://sciencemastodon.com/users/dragonsidedd/statuses/113128487039724668",
"activitypub"
]
],
"content": "“[T]he next model update performs similarly to PhD students on challenging benchmark tasks in physics, chemistry, and biology. We also found that it excels in math and coding. In a qualifying exam for the International Mathematics Olympiad (IMO), GPT-4o correctly solved only 13% of problems, while the reasoning model scored 83%. Their coding abilities were evaluated in contests and reached the 89th percentile in Codeforces competitions.”\n\nhttps://openai.com/index/learning-to-reason-with-llms/",
"sig": "067c5e3a5d56a0072bb690c438adb5fa87387cc97521ff768efad2212e2acf35ff9485c7851269f3a5e83c8c8a2313b88c3d62d02beddf8588b493469351c758"
}