Walkthrough
Overview
Custom Trace Metrics let you extract a specific numerical value from your agent’s OpenTelemetry spans and aggregate it across all turns in a simulation.
Use Custom Trace Metrics when you have a signal already captured in your traces — latency measurements, confidence scores, token counts, retry attempts — that you want to track and trend across runs.
Prerequisites
Your agent must be instrumented with OpenTelemetry and sending spans to Coval. See the OpenTelemetry Traces guide for setup instructions. If traces are not present for a simulation, the metric will report an error at execution time.
Configuration
When creating a Custom Trace Metric, configure three fields:
| Field | Description |
|---|
| Span Name | The name of the OTel span to query (e.g. llm, tts, stt, llm_tool_call, or any custom span name you emit). |
| Metric Attribute | The span attribute to extract the value from (e.g. retrieval_latency_ms, confidence_score, or another custom numeric attribute key). |
| Aggregation Method | How to aggregate the extracted values across all matching spans in the simulation. |
Aggregation Methods
| Method | Description |
|---|
| Average | Mean value across all matching spans. Best for typical-case latency or scores. |
| Median | Median value across all matching spans. More robust to outliers than average. |
| p90 | 90th-percentile value. Best for understanding worst-case performance at scale. |
| Max | Maximum value observed across all matching spans. Useful for worst-case detection. |
| Min | Minimum value observed across all matching spans. |
Span Names
Any span name your agent emits can be queried. The following well-known span names map to Coval’s built-in trace components:
| Span Name | Component |
|---|
llm | Language model invocations |
tts | Speech synthesis |
stt | Speech recognition |
llm_tool_call | Individual tool/function calls |
turn | A single conversation turn |
Custom span names (e.g. document_retrieval, database_lookup) work as well — use whatever names your agent emits.
How to Create
Open the Metrics page
Navigate to the Metrics section in the Coval dashboard.
Click Create Metric
Select Custom Trace Metrics from the metric type group.
Configure the metric
Fill in Span Name, Metric Attribute, and Aggregation Method for your use case.
Name and save
Give the metric a descriptive name and save. It is now available to add to any run.
Use Cases
Custom Latency Tracking
Extract average document retrieval latency from your custom retrieval spans:
| Field | Value |
|---|
| Span Name | document_retrieval |
| Metric Attribute | retrieval_latency_ms |
| Aggregation Method | Average |
This gives you the average retrieval latency across all turns in the simulation. Compare it across runs to catch regressions after changes to your index, embeddings, or chunking strategy.
p90 External API Latency
Track tail latency for an external service your agent depends on:
| Field | Value |
|---|
| Span Name | weather_api |
| Metric Attribute | duration_ms |
| Aggregation Method | p90 |
Use p90 instead of average when you care about tail performance instead of typical performance, especially for services that can occasionally spike.
If your agent emits custom spans for specific tool calls with a duration attribute:
| Field | Value |
|---|
| Span Name | database_lookup |
| Metric Attribute | duration_ms |
| Aggregation Method | Average |
If your agent records a confidence score on each language model span:
| Field | Value |
|---|
| Span Name | llm |
| Metric Attribute | confidence_score |
| Aggregation Method | Average |
Custom Trace Metrics complement built-in trace metrics like LLM Time to First Byte and TTS Time to First Byte. Use the built-in metrics for standard pipeline components and Custom Trace Metrics for signals specific to your agent’s instrumentation.