Anders Thoresson on Nostr: I need a tool where I can benchmark variants of #LLM #prompts against each other in a ...
I need a tool where I can benchmark variants of #LLM #prompts against each other in a structured way. What I have in mind is a service where you enter a couple of alternatives, and have users rank outputs with knowing which prompt was used.
Is there a solution for this?
To be used with local models, btw.
#ai
Published at
2025-01-14 20:46:13Event JSON
{
"id": "f6ce1a31ab7bd181faf4e6c5b09e820368d9a1ffe7e888cc09f88d8dcd4953b8",
"pubkey": "3de09c091d70297fb4830f0cec0c39a2bce3bb905a4b15f7a0af857bcac4cfa4",
"created_at": 1736887573,
"kind": 1,
"tags": [
[
"t",
"llm"
],
[
"t",
"prompts"
],
[
"t",
"ai"
],
[
"proxy",
"https://thoresson.social/users/anders/statuses/113828664007192720",
"activitypub"
]
],
"content": "I need a tool where I can benchmark variants of #LLM #prompts against each other in a structured way. What I have in mind is a service where you enter a couple of alternatives, and have users rank outputs with knowing which prompt was used. \n\nIs there a solution for this? \n\nTo be used with local models, btw. \n\n#ai",
"sig": "40932168141f2095d897c30ec311cbf54416d4acb1889e28f2e708e203328c222af97cd4f02b4cb1ac64b571f804c6e44a5b2508e65c64d47a7199343578f692"
}