someone on Nostr: RLNF: Reinforcement Learning from Nostr Feedback We ask a question to two different ...
RLNF: Reinforcement Learning from Nostr Feedback
We ask a question to two different LLMs.
We let nostriches vote which answer is better.
We reuse the feedback in further fine tuning the LLM.
We zap the nostriches.
AI gets super wise.
Every AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds.
Thoughts?
We ask a question to two different LLMs.
We let nostriches vote which answer is better.
We reuse the feedback in further fine tuning the LLM.
We zap the nostriches.
AI gets super wise.
Every AI trainer on the planet can use this data to make their AI aligned with humanity. AHA succeeds.
Thoughts?