Judging Judges: All that is LLM Judgements does not glitter

An examination of where LLM-as-a-Judge can satisfyingly act as a judge of an outer model's performance and where it fails to perform as well as humans.