someone on Nostr: i have about 1000 questions that I ask every model. i accept some models as ground ...
i have about 1000 questions that I ask every model. i accept some models as ground truth. then compare the tested with the ground truth and count the number of answers that agree with the ground truth. building ground truth is the harder part. check out Based LLM Leaderboard on wikifreedia.
Ostrich is a ground truth. I continued to build on it over the months. Mike Adams also after stopping for a while, came back and building a newer one.
Published at
2025-01-11 16:21:19Event JSON
{
"id": "2efbf5e61b51f27fbd6ce0f97a5be641299ff81f32e774d48ad657e988c8ce65",
"pubkey": "9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1",
"created_at": 1736612479,
"kind": 1,
"tags": [
[
"p",
"9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1",
""
],
[
"p",
"40b9c85fffeafc1cadf8c30a4e5c88660ff6e4971a0dc723d5ab674b5e61b451",
""
],
[
"e",
"a10288cf6210fc4934c2a740ca31598bd69d9844b1404eb66a96669d1ce954ba",
"wss://a.nos.lol",
"reply",
"40b9c85fffeafc1cadf8c30a4e5c88660ff6e4971a0dc723d5ab674b5e61b451"
],
[
"e",
"cbebef72933547e012770d35898f44c694ce5e3782736ed68b7a9e7d0af8d3f1",
"",
"root",
"9fec72d579baaa772af9e71e638b529215721ace6e0f8320725ecbf9f77f85b1"
]
],
"content": "i have about 1000 questions that I ask every model. i accept some models as ground truth. then compare the tested with the ground truth and count the number of answers that agree with the ground truth. building ground truth is the harder part. check out Based LLM Leaderboard on wikifreedia.\nOstrich is a ground truth. I continued to build on it over the months. Mike Adams also after stopping for a while, came back and building a newer one.",
"sig": "35c2ab0c1562c8a5480651893fcaabd093094bf5a2a5082031a4c128dce0ee2e3d47da2225494d272064d93bdb7dc2931b57aeb5222cfa171e0841709a9ec05f"
}