Replacing the Judge: Can Llama 405B Outperform GPT4 in the Court of AI?

While LLM-as-a-Judge offers a favorable alternative to human evaluations, closed source LLMs impose some limitations with this evaluation framework.