Metrics
Understand and analyze your AI agents’ performance with Coval’s comprehensive metrics
Understanding Coval Metrics
Metrics are essential tools for evaluating Voice and Chat Agents, acting as judges that assess call quality. They analyze call transcripts and/or recordings to provide evaluations based on specific criteria. Metrics help identify strengths and areas for improvement. Regular analysis of these metrics enables organizations to make data-driven decisions, enhancing agent performance and user satisfaction.
A metric is a measurable criterion used to evaluate performance, defined by clear objectives, evaluation criteria, and prompts tailored to assess specific behaviors or outcomes.
Types of Metrics
The metrics that you define can be run on both simulated calls and live-calls that you push to Observability.
The most important metrics to assess agent performance are:
- LLM-Judge metrics: Define custom prompts to evaluate your agents based on your prompt, your required steps to be performed, and desired outcomes
- Binary Tool Call metrics: Check if your tool calls (functions) have been performed correctly
- Categorical metrics: Define a set of categories/topics to filter topics of your conversations (good for exploratory call analysis)
- Audio metrics (voice-agents-only): Audio performance
- Audio LLM Judge: While LLM Judge metrics use the transcript as an input, Audio-LLM Judge metrics take the audio as an input. You can define testing prompts to ask questions on things like sentiment analysis
- Custom: If you have your own metrics that you want to upload to the Coval platofrm to run next to our built-in metrics, let us know.
Create a Metric
Add a Display Name: Give your metric a clear, descriptive name.
Select Manager Type: Choose what type of metric you want to define.
Question: Define the specific goal you want your agent to achieve. For evaluating the agent, always refer to the “assistant”
Example for an LLM-as-a-Judge Question Prompt:
“Given the transcript, does the assistant successfully schedule the appointment according to the user’s preferences?”
Description: Provide an internal description for better clarity and context.
Need custom metrics tailored to your needs? Contact us, and we’ll create them for you.
Was this page helpful?