Coval Concepts
Test Sets
Test Sets define and evaluate your agent’s behavior across different scenarios/workflows.
By organizing transcripts into test sets, you can easily track performance across different agent scenarios and identify patterns or issues. A test sets consist of multiple records that fit into a specific agent test case.
Metrics
Metrics in Coval are essential for tracking the performance and success of your agent interactions.
Key metrics to monitor include LLM-Binary-Questions like “Was the Goal X achieved?” or “Did the agent use neutral language?”. Coval also provides out-of-the-box Toolcall analysis to help you assess agent efficiency.
Learn more about available metrics.
Simulators
Simulators define your end-user behavior and interactions with your agent.
While your test set describes how your agent functions, and metrics measure their performance, simulators capture the real-world context and scenarios your agent will encounter. For example, if your agent is a doctor’s office receptionist, the simulator would define the patient’s behavior, like calling reception to reschedule an appointment. Simulators help ensure your agent is tested in the most realistic user interactions.
Evaluations
Evaluations let you test your agent’s performance by running it against defined test sets, metrics, and simulators.
They help you identify issues, optimize behavior, and ensure consistent performance across workflows in real-world scenarios.
Scale plans have custom eval strategies and can have higher limits on evaluations/month.
Monitoring
Set up monitoring with Coval to track and optimize your AI agent’s performance in prod.
Quickly identify issues, evaluate behavior across scenarios, and ensure your agents meet their goals—helping you improve reliability and efficiency in production. Once your CI/CD is configured, incoming transcripts will be automatically monitored, giving you continuous insights into agent performance.
Set up custom alerts to be notified of performance issues, goal discrepancies, or any anomalies in real-time. This allows for proactive issue resolution and ensures smooth operations as your agents scale.
Coval Suite
Was this page helpful?