Learn
Practical guides to LLM evaluation
No fluff. Learn how RAG evaluation works, how to build reliable judges, and how to wire evals into your pipeline.
RAG Evaluation
How to Evaluate RAG Systems: The Complete Guide
The metrics that matter, how to build a golden dataset, and how to wire evaluation into your CI pipeline.
Jan 15, 2025 · 14 min Read →
Evaluation Methods
LLM-as-a-Judge: A Practical Guide
The three modes, how to fight position and verbosity bias, choosing a judge model, and running it in production without cost blowout.
Jan 20, 2025 · 12 min Read →
Self-Hosting
Self-Hosting eval.ninja: Complete Deployment Guide
Docker, Kubernetes, AWS ECS, Cloud Run, and serverless — every deployment target with working config files.
Jan 25, 2025 · 10 min Read →
Comparison
eval.ninja vs Promptfoo
Promptfoo is CLI-first with strong red teaming. eval.ninja is API-first with hosted scoring. Honest comparison of both.
Feb 1, 2025 · 8 min Read →