Blogs

Notes, results, and methodology from the DeepQuery team.

DeepQuery: An Arena-based Retrieval Augmented Generation Evaluation Platform

Taaha Khan, Akash Ravandhu, Luke Merletti, Caleb Li, Sayam Goyal

Department of Computer Science, Purdue University

2026-04

Introducing DeepQuery — a benchmarking methodology for RAG systems using LLM-as-a-Judge pairwise comparisons and Bradley-Terry ranking. We evaluate vanilla, reranker-based, and Captain pipelines across the OpenRAG-Bench dataset on accuracy, win rate, and latency.