Evals for AI Agents
- Get now
- Learn more
- Discussions
1. What are Agent Evaluations?

1. Introduction to Evals
2. Generic vs. Targeted Evals
3. Regression Testing - Online vs Offline Evals
4. The 3 Data Pillars of Evaluation
Wrap Up Quiz
Practical 1: Intro to Course Project
2. Human-in-the-Loop Evals

6. Human-in-the-Loop Evaluation
7. Designing Human Evaluations
8. From Annotations to Patterns
Wrap Up Quiz
Practical 2: Observing and Annotating Your Traces
3. LLM-as-Judge

10. LLM-as-a-Judge
11. When to Use LLM-as-Judge
12. Building Effective Judge Prompts
Wrap Up Quiz
Practical 3: Creating Evaluators from Issues
4. Programmatic Rules

14. Programmatic Rule Evaluations
15. When to Use Programmatic Rules
16. Designing Effective Programmatic Rules
17. Integrating the 3 Types of Evals
Wrap Up Quiz
Practical 4: Creating a Golden Dataset
You made it!

Get Your Certificate

3. LLM-as-Judge

Learn to automate human-like judgment at scale using a model to score your agent's outputs against criteria you define.

5 Lessons