What is Nostr?
Abhinav Tushar /
npub1lt2…cne0
2024-11-26 19:00:02

Abhinav Tushar on Nostr: I got to know of an #ICML2024 paper that shows debates helping in scalable oversight. ...

I got to know of an #ICML2024 paper that shows debates helping in scalable oversight. Basically debate 'protocol' with untrustworthy experts (#llm) and weak judge (LLMs or humans) is better at getting the answer in case ground truth is missing. This is interesting since I was trying out something similar a few months back.

Here is the pre-print: https://arxiv.org/pdf/2402.06782

One of the insight disproves what I was assuming earlier:

"Insight 7: More non-expert interaction does not improve accuracy. We find identical judge accuracy between static
and interactive debate. This suggests that adding non-expert interactions does not help in information-asymmetric debates. This is surprising, as interaction allows judges to direct debates towards their key uncertainties."

But there are more insights (and gaps) that could help improve what I was doing in my post on interventional debates here: https://mathstodon.xyz/@lepisma/113332484552749036
Author Public Key
npub1lt2vwap0znasrsawdug096va7wu3l0yzwcq6qtlln0sumqzfja0qdgcne0