I need a tool where I can benchmark variants of #LLM #prompts against each other in a ...

2025-01-14 20:46:13

I need a tool where I can benchmark variants of #LLM #prompts against each other in a structured way. What I have in mind is a service where you enter a couple of alternatives, and have users rank outputs with knowing which prompt was used.

Is there a solution for this?

To be used with local models, btw.

#ai

Author Public Key

npub18hsfczgawq5hldyrpuxwcrpe527w8wustf93taaq47zhhjkye7jqqves7y

Show more details

Anders Thoresson on Nostr: I need a tool where I can benchmark variants of #LLM #prompts against each other in a ...