Dave Rahardja on Nostr: nprofile1q…xkfeg So from reading the Methodology, here’s what I gathered about ...
nprofile1qy2hwumn8ghj7un9d3shjtnddaehgu3wwp6kyqpql3dx6x0q9ydnycm4k3hnv8dq5tjz8czjmvdyzryax8487759z0wqxxkfeg (nprofile…kfeg) So from reading the Methodology, here’s what I gathered about how they performed this test:
1. Pick 200 random abstracts (not the whole paper) from the Journal of Neuroscience.
2. Have ChatGPT modify a random subset of said abstracts to create contradictory conclusions.
3. Present abstracts to human and LLMs, and ask them to determine if the abstract has been altered, and how confident they are of their answer.
Given this methodology, I think “LLMs surpassing human experts in predicting neuroscience results” is…a grandiose claim. More precisely, the claim should be: “LLMs are better than human experts at detecting altered neuroscience abstracts”.
1. Pick 200 random abstracts (not the whole paper) from the Journal of Neuroscience.
2. Have ChatGPT modify a random subset of said abstracts to create contradictory conclusions.
3. Present abstracts to human and LLMs, and ask them to determine if the abstract has been altered, and how confident they are of their answer.
Given this methodology, I think “LLMs surpassing human experts in predicting neuroscience results” is…a grandiose claim. More precisely, the claim should be: “LLMs are better than human experts at detecting altered neuroscience abstracts”.