The Agent Evaluation Revolution

Media Summary: This video introduces a new series on testing AI This lecture discusses the critical shift from Jason Lopatecki, Co-Founder and CEO of Arize AI, dives into the world of

The Agent Evaluation Revolution - Detailed Analysis & Overview

This video introduces a new series on testing AI This lecture discusses the critical shift from Jason Lopatecki, Co-Founder and CEO of Arize AI, dives into the world of Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ... Shishir Patal, a Research Scientist at Meta, delivered a presentation on AI Hamel Husain and Shreya Shankar teach the world's most popular course on AI evals and have trained over 2000 PMs and ...

... verbosity, self-enhancement bias 00:47:22 Best practices 00:54:06 Factuality 01:00:15

Photo Gallery

The agent evaluation revolution

Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary

Agent evaluation with ADK & Vertex AI | The Agent Factory Podcast

Top 5 AI Agent Evaluation Tools (2025): Maxim AI, Langfuse, Arize | LLM Observability Comparison

AI Agent evaluation: A complete guide to measuring performance

Evaluating Agents and Assistants: The AI Conference

LLM as a Judge: Scaling AI Evaluation Strategies

How to Evaluate AI Agents: Comprehensive Strategies for Reliable, High‑Quality Agentic Systems

Agentic Evals by Shishir Patil

Evaluating and Debugging Non-Deterministic AI Agents

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Agent Behavior Evaluation | Evaluate AI Agent Value | Triage Agent Responses | Quiz

View Detailed Profile

The agent evaluation revolution

The agent evaluation revolution

This video introduces a new series on testing AI

Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary

Agent Evaluation & Benchmarks - Agentic AI MOOC 2025 Lecture 4 Summary

This lecture discusses the critical shift from

Agent evaluation with ADK & Vertex AI | The Agent Factory Podcast

Agent evaluation with ADK & Vertex AI | The Agent Factory Podcast

Learn how to effectively

Top 5 AI Agent Evaluation Tools (2025): Maxim AI, Langfuse, Arize | LLM Observability Comparison

Top 5 AI Agent Evaluation Tools (2025): Maxim AI, Langfuse, Arize | LLM Observability Comparison

The landscape of AI

AI Agent evaluation: A complete guide to measuring performance

AI Agent evaluation: A complete guide to measuring performance

Evaluating

Evaluating Agents and Assistants: The AI Conference

Evaluating Agents and Assistants: The AI Conference

Jason Lopatecki, Co-Founder and CEO of Arize AI, dives into the world of

LLM as a Judge: Scaling AI Evaluation Strategies

LLM as a Judge: Scaling AI Evaluation Strategies

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

How to Evaluate AI Agents: Comprehensive Strategies for Reliable, High‑Quality Agentic Systems

How to Evaluate AI Agents: Comprehensive Strategies for Reliable, High‑Quality Agentic Systems

Evaluating

Agentic Evals by Shishir Patil

Agentic Evals by Shishir Patil

Shishir Patal, a Research Scientist at Meta, delivered a presentation on AI

Evaluating and Debugging Non-Deterministic AI Agents

Evaluating and Debugging Non-Deterministic AI Agents

Evaluate

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Why AI evals are the hottest new skill for product builders | Hamel Husain & Shreya Shankar

Hamel Husain and Shreya Shankar teach the world's most popular course on AI evals and have trained over 2000 PMs and ...

Agent Behavior Evaluation | Evaluate AI Agent Value | Triage Agent Responses | Quiz

Agent Behavior Evaluation | Evaluate AI Agent Value | Triage Agent Responses | Quiz

Badge:-

AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)

AWS re:Invent 2025 - Improve agent quality in production with Bedrock AgentCore Evaluations(AIM3348)

Amazon Bedrock AgentCore

How to evaluate agents in practice

How to evaluate agents in practice

Evaluating Agents

AI Agent Evaluation with RAGAS

AI Agent Evaluation with RAGAS

RAGAS (RAG ASsessment) is an

How to evaluate agent trajectories with AgentEvals

How to evaluate agent trajectories with AgentEvals

Evaluating

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

... verbosity, self-enhancement bias 00:47:22 Best practices 00:54:06 Factuality 01:00:15