mostly Gemini 2.5 pro. it is generally amazing at most things, I just have to follow behind it with a mop.
what you say makes sense. and in the context of arenas, the scorers aren't committing that code or and having to interact with it, or iteratively improve it.