Skip to main content

Walkthrough

Overview

Custom Trace Metrics let you extract a specific numerical value from your agent’s OpenTelemetry spans and aggregate it across all turns in a simulation. Use Custom Trace Metrics when you have a signal already captured in your traces — latency measurements, confidence scores, token counts, retry attempts — that you want to track and trend across runs.

Prerequisites

Your agent must be instrumented with OpenTelemetry and sending spans to Coval. See the OpenTelemetry Traces guide for setup instructions. If traces are not present for a simulation, the metric will report an error at execution time.

Configuration

When creating a Custom Trace Metric, configure three fields:
FieldDescription
Span NameThe name of the OTel span to query (e.g. llm, tts, stt, llm_tool_call, or any custom span name you emit).
Metric AttributeThe span attribute to extract the value from (e.g. retrieval_latency_ms, confidence_score, or another custom numeric attribute key).
Aggregation MethodHow to aggregate the extracted values across all matching spans in the simulation.

Aggregation Methods

MethodDescription
AverageMean value across all matching spans. Best for typical-case latency or scores.
MedianMedian value across all matching spans. More robust to outliers than average.
p9090th-percentile value. Best for understanding worst-case performance at scale.
MaxMaximum value observed across all matching spans. Useful for worst-case detection.
MinMinimum value observed across all matching spans.

Span Names

Any span name your agent emits can be queried. The following well-known span names map to Coval’s built-in trace components:
Span NameComponent
llmLanguage model invocations
ttsSpeech synthesis
sttSpeech recognition
llm_tool_callIndividual tool/function calls
turnA single conversation turn
Custom span names (e.g. document_retrieval, database_lookup) work as well — use whatever names your agent emits.

How to Create

1

Open the Metrics page

Navigate to the Metrics section in the Coval dashboard.
2

Click Create Metric

Select Custom Trace Metrics from the metric type group.
3

Configure the metric

Fill in Span Name, Metric Attribute, and Aggregation Method for your use case.
4

Name and save

Give the metric a descriptive name and save. It is now available to add to any run.

Use Cases

Custom Latency Tracking

Extract average document retrieval latency from your custom retrieval spans:
FieldValue
Span Namedocument_retrieval
Metric Attributeretrieval_latency_ms
Aggregation MethodAverage
This gives you the average retrieval latency across all turns in the simulation. Compare it across runs to catch regressions after changes to your index, embeddings, or chunking strategy.

p90 External API Latency

Track tail latency for an external service your agent depends on:
FieldValue
Span Nameweather_api
Metric Attributeduration_ms
Aggregation Methodp90
Use p90 instead of average when you care about tail performance instead of typical performance, especially for services that can occasionally spike.

Tool Call Duration Monitoring

If your agent emits custom spans for specific tool calls with a duration attribute:
FieldValue
Span Namedatabase_lookup
Metric Attributeduration_ms
Aggregation MethodAverage

Confidence Score Extraction

If your agent records a confidence score on each language model span:
FieldValue
Span Namellm
Metric Attributeconfidence_score
Aggregation MethodAverage
Custom Trace Metrics complement built-in trace metrics like LLM Time to First Byte and TTS Time to First Byte. Use the built-in metrics for standard pipeline components and Custom Trace Metrics for signals specific to your agent’s instrumentation.