OpenTelemetry Traces

Beta Feature — Tracing with OpenTelemetry is currently in beta and under active development. Functionality and APIs may change as we continue to improve the experience.

You can send traces from your agent to Coval using the OpenTelemetry SDK. This lets you capture detailed span data — such as tool calls, LLM invocations, and other operations — and export it directly to Coval for analysis alongside your simulation or conversation results. Tracing works for both simulations (where Coval calls your agent) and conversations (where you submit post-hoc call data). The setup differs only in how you identify the call — everything else (instrumentation, span naming, viewing) is the same.

New to tracing? If you’re using Pipecat, LiveKit, or Vapi, the Coval Wizard (Beta) can instrument your agent automatically with one command: npx @coval/wizard

Already using Langfuse? Skip instrumenting a second SDK — connect your Langfuse account once and Coval imports traces automatically for each simulation. See Import Traces from Langfuse.

Already using Arize Phoenix? Connect your Phoenix project once in Settings and Coval pulls spans after each simulation — Phoenix is OTel-native, so the integration is a thin fetch. See Import Traces from Arize Phoenix.

Prerequisites

A Coval account with an API key (manage your keys)
A simulation output ID (for simulations) or a conversation ID (for conversations)
Python 3.8+ with the OpenTelemetry SDK installed

Install the required packages:

pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

Configuration

Configure the OpenTelemetry tracer provider to export spans to Coval’s trace ingestion endpoint:

from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

# Configure tracer
resource = Resource.create({SERVICE_NAME: "my-agent"})
provider = trace_sdk.TracerProvider(resource=resource)

exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Simulation-Id": "<simulation_id>",
    },
    timeout=30,
)

provider.add_span_processor(SimpleSpanProcessor(exporter))
tracer = provider.get_tracer("my-agent")

Parameter	Description
`endpoint`	Coval’s OTLP trace ingestion URL: `https://api.coval.dev/v1/traces`
`X-API-Key`	Your Coval API key
`X-Simulation-Id`	The simulation output ID for the individual call being traced. This is per-simulation-call, not the run ID.
`timeout`	Export timeout in seconds. Must be set to `30` (see note below)
`SERVICE_NAME`	A name identifying your agent service

The timeout parameter must be set to 30 seconds to ensure spans are exported reliably. We are working on reducing this requirement in a future update.

Getting the Simulation Output ID

The X-Simulation-Id header must be set to the simulation output ID for the specific call you’re tracing. The simulation output ID is a per-call identifier — different from the run ID. Here’s how to obtain it at runtime.

Inbound voice agents

When Coval places an inbound call, it passes the simulation output ID as a SIP header: X-Coval-Simulation-Id. Read this header when the call arrives and use it to configure your OTLP exporter.

# Example: reading the simulation output ID from a SIP header
# In your call.initiated webhook handler (Telnyx example):
simulation_id = next(
    h["value"] for h in event["payload"]["sip_headers"]
    if h["name"] == "X-Coval-Simulation-Id"
)

exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Simulation-Id": simulation_id,
    },
    timeout=30,
)

See the Inbound Voice guide for provider-specific instructions on reading SIP headers (Twilio SIP trunking, Telnyx, etc.).

Twilio Programmable Voice (PSTN) — Standard Twilio phone numbers route over the public telephone network, which strips SIP headers. Use the pre_call_webhook_url agent config instead: Coval will POST the simulation ID to your agent before dialing. See the Twilio ConversationRelay guide.

Outbound voice agents

Coval’s outbound trigger POST can include the simulation output ID in the request payload. Add simulation_output_id to your trigger_call_payload configuration in your template, then read it when your webhook receives the trigger and use it to configure the exporter.

You can also find simulation output IDs in the Coval dashboard under any run’s results, or via the Coval API.

Tracing for Conversations

For conversations (post-hoc call evaluation), there is no Coval-initiated call, so there is no simulation output ID available at call time. Instead, you use a conversation ID to associate traces with a conversation. The conversation ID is only available after the call ends and you submit the transcript to Coval — which means you can’t configure the OTLP exporter up front. The solution is to buffer spans in memory during the call, then flush them once you have the ID.

Buffer spans during the call

Use InMemorySpanExporter (included in opentelemetry-sdk) to hold spans locally during the call instead of exporting them in real time.

from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

resource = Resource.create({SERVICE_NAME: "my-agent"})
in_memory_exporter = InMemorySpanExporter()
provider = trace_sdk.TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(in_memory_exporter))
tracer = provider.get_tracer("my-agent")

# Instrument your agent as normal — spans accumulate in memory
with tracer.start_as_current_span("llm") as span:
    span.set_attribute("metrics.ttfb", 0.42)
    response = call_llm()

Submit the conversation after the call ends

Post the transcript (and optionally audio) to POST /v1/conversations:submit. The response contains the conversation_id you need for trace export.

import requests

response = requests.post(
    "https://api.coval.dev/v1/conversations:submit",
    headers={
        "x-api-key": "<COVAL_API_KEY>",
        "Content-Type": "application/json",
    },
    json={
        "transcript": [
            {"role": "user", "content": "Hello", "start_time": 0.0, "end_time": 1.2},
            {"role": "assistant", "content": "Hi! How can I help?", "start_time": 1.5, "end_time": 3.0},
        ],
    },
)
conversation_id = response.json()["conversation"]["conversation_id"]

See POST /v1/conversations:submit for the full request schema including optional audio, metadata, and metrics fields.

If your recording URL isn’t available at call end (common with Twilio Programmable Voice in multi-replica deployments), submit the transcript now to get a conversation_id for trace correlation, then attach the audio later with PATCH /v1/conversations/{conversation_id}. Text-only metrics fire after submit, audio metrics fire after PATCH.

Export the buffered spans

Create an OTLP exporter with X-Conversation-Id and flush the buffered spans to Coval.

from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

otlp_exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Conversation-Id": conversation_id,
    },
    timeout=30,
)

finished_spans = in_memory_exporter.get_finished_spans()
if finished_spans:
    otlp_exporter.export(list(finished_spans))

Parameter	Description
`X-Conversation-Id`	The `conversation_id` returned by `POST /v1/conversations:submit`. Use this instead of `X-Simulation-Id`.

Traces can be sent immediately after submitting a conversation — no delay is needed.

Full conversation tracing example

import requests
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

COVAL_API_KEY = "<COVAL_API_KEY>"

# --- Call setup: buffer spans in memory ---
resource = Resource.create({SERVICE_NAME: "my-agent"})
in_memory_exporter = InMemorySpanExporter()
provider = trace_sdk.TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(in_memory_exporter))
tracer = provider.get_tracer("my-agent")

# --- During the call: instrument as normal ---
with tracer.start_as_current_span("llm") as span:
    span.set_attribute("metrics.ttfb", 0.42)
    response = call_llm()

with tracer.start_as_current_span("tts") as span:
    span.set_attribute("metrics.ttfb", 0.18)
    audio = synthesize_speech(response)

# --- After the call ends: submit transcript, then export spans ---
submit_response = requests.post(
    "https://api.coval.dev/v1/conversations:submit",
    headers={"x-api-key": COVAL_API_KEY, "Content-Type": "application/json"},
    json={"transcript": transcript},
)
conversation_id = submit_response.json()["conversation"]["conversation_id"]

otlp_exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={"X-API-Key": COVAL_API_KEY, "X-Conversation-Id": conversation_id},
    timeout=30,
)
finished_spans = in_memory_exporter.get_finished_spans()
if finished_spans:
    otlp_exporter.export(list(finished_spans))

Uploading Traces via the Dashboard

You can also upload traces directly from the Coval dashboard without using the SDK. In the Conversations page, click Upload to Conversations and:

Add your audio file or transcript as usual
In the Traces (Optional) section, select your OTLP traces JSON file (must contain a resourceSpans array)
Click Upload — the conversation and traces are submitted together

This is useful for testing, debugging, or uploading historical traces that were captured separately.

Payload Limits & Batching

A single export request to /v1/traces has a size limit. Large buffered exports — most commonly the end-of-call flush in the conversation flow above — can exceed it and fail with 413 Request Entity Too Large. Keep each export request under roughly 3–4 MB. Treat this as a practical target, not a fixed contract: stay comfortably below it rather than tuning to an exact boundary.

Splitting spans across requests

You can split one call’s spans across multiple export requests. Every request carrying the same X-Conversation-Id (or X-Simulation-Id) is merged server-side into a single trace, reconstructed from each span’s parent/child relationships. There is no ordering requirement between requests. The simplest way to stay under the limit is BatchSpanProcessor with a bounded batch size, which chunks exports for you:

from opentelemetry.sdk.trace.export import BatchSpanProcessor

provider.add_span_processor(
    BatchSpanProcessor(otlp_exporter, max_export_batch_size=512)
)

Lower max_export_batch_size if your spans carry large attributes such as full transcripts or prompts.

Retry only the failed batch. Spans are stored append-only with no de-duplication. If an export fails, resend only that batch — re-sending batches that already succeeded will duplicate spans in the trace view and double-count trace-based metrics.

Spans can arrive before POST /v1/conversations:submit has finished registering the conversation. They are still attributed correctly and reconcile automatically — no special handling needed on your side.

Instrumenting Your Agent

Once the tracer is configured, wrap operations in spans to capture trace data:

# Use tracer in agent code
with tracer.start_as_current_span("llm_tool_call") as span:
    span.set_attribute("function.name", "search_database")
    span.set_attribute("tool_call_id", "call_123")
    result = call_tool()

You can nest spans to capture the full call hierarchy of your agent — for example, a parent span for the overall request and child spans for individual tool calls or LLM invocations.

Shutdown — Call provider.shutdown() when your agent exits. With SimpleSpanProcessor, spans are exported synchronously as each span ends (not buffered), so they are already in Coval before shutdown is called. Shutdown is still good practice for clean resource teardown.

# Call on agent exit for clean resource teardown.
provider.shutdown()

Span Naming Conventions

Coval’s trace viewer applies semantic colors and labels to well-known span names. Using these names gives a richer experience in the UI and enables built-in trace metrics.

Span Name	Use For	Required Attributes	Optional / Recommended Attributes	Accepted Compatibility Aliases
`llm`	LLM invocations	—	`metrics.ttfb` (seconds), `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`, `llm.finish_reason` (`stop`, `tool_calls`, `length`, `content_filter`)	—
`tts`	Text-to-Speech	—	`metrics.ttfb` (seconds)	—
`stt`	Speech-to-Text	`transcript` when using STT Word Error Rate or the Audio Upload variant	`metrics.ttfb` (seconds), `stt.confidence` (ASR confidence 0.0-1.0)	`stt.transcription` is accepted by STT WER for older integrations, but new integrations should emit `transcript`
`stt.provider.<name>`	Per-provider STT attempt (child of `stt`)	—	`stt.providerName`, `stt.confidence`, `metrics.ttfb`	—
`vad`	Voice Activity Detection	—	—	—
`llm_tool_call`	Individual tool/function calls	—	`function.name`, `tool_call_id`, `function.arguments`	Span name `tool_call`; attributes `tool.name`, `tool.call_id`, `tool.arguments`
`turn`	A single conversation turn	—	—	—
`conversation`	Full conversation	—	—	—
`pipeline`	Processing pipeline	—	—	—
`transport`	Audio/network transport	—	—	—

Any span name works — spans with names not listed above will still appear in the UI with auto-assigned colors. Use service.name in your Resource to group spans by service.

For complete working implementations, see the voice agent examples on GitHub — Vapi, Pipecat, and LiveKit agents that emit the full span schema.

Instrumenting STT Spans

To use the STT Word Error Rate metric (or its Audio Upload variant), your agent must emit stt spans with a transcript attribute containing the transcribed text. This is what allows Coval to compare your agent’s STT output against a reference transcript. Coval also accepts the older stt.transcription alias for compatibility, but transcript is the canonical attribute for new integrations. We also recommend attaching stt.confidence when your STT provider exposes a per-utterance confidence score. Here is an example using the Pipecat framework:

from opentelemetry import trace as otel_trace
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.utils.tracing.service_decorators import traced_stt


def _read_path(value, *path):
    current = value
    for segment in path:
        if current is None:
            return None
        if isinstance(segment, int):
            if isinstance(current, (list, tuple)) and 0 <= segment < len(current):
                current = current[segment]
            else:
                return None
            continue
        if isinstance(current, dict):
            current = current.get(segment)
        else:
            current = getattr(current, segment, None)
    return current


def extract_stt_confidence(result):
    confidence = _read_path(result, "channel", "alternatives", 0, "confidence")
    if confidence is None:
        return None
    normalized = float(confidence)
    if 0.0 <= normalized <= 1.0:
        return round(normalized, 4)
    return None


class CovalDeepgramSTTService(DeepgramSTTService):
    """Adds stt.confidence to Pipecat's built-in traced `stt` spans."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._current_stt_confidence = None

    async def _on_message(self, message):
        is_final = bool(getattr(message, "is_final", False))
        self._current_stt_confidence = extract_stt_confidence(message) if is_final else None
        try:
            await super()._on_message(message)
        finally:
            if is_final:
                self._current_stt_confidence = None

    @traced_stt
    async def _handle_transcription(self, transcript, is_final, language=None):
        if is_final and self._current_stt_confidence is not None:
            span = otel_trace.get_current_span()
            if span.is_recording():
                span.set_attribute("stt.confidence", self._current_stt_confidence)

Instantiate the subclass in your pipeline. With PipelineTask(..., enable_tracing=True), Pipecat still emits the standard stt span, and the subclass adds stt.confidence onto that same span:

stt = CovalDeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
])

For non-Pipecat agents, emit equivalent spans wherever your STT returns final transcriptions:

from opentelemetry import trace as otel_trace

tracer = otel_trace.get_tracer("my-stt-instrumentation")

with tracer.start_as_current_span("stt") as span:
    span.set_attribute("transcript", transcription_text)
    if confidence_score is not None:
        span.set_attribute("stt.confidence", confidence_score)

The span must be named "stt" and include the transcript attribute with the transcribed text. stt.confidence is optional, but when present it should be a 0.0-1.0 score for the final utterance.

Instrumenting LLM Spans

Include llm.finish_reason on llm spans so you can tell why the model stopped generating. This is especially useful when debugging responses that were silently cut off because llm.finish_reason=length. Here is a Pipecat example that enriches the built-in traced llm span:

from opentelemetry import trace as otel_trace
from pipecat.services.openai.llm import OpenAILLMService


def _read_path(value, *path):
    current = value
    for segment in path:
        if current is None:
            return None
        if isinstance(segment, int):
            if isinstance(current, (list, tuple)) and 0 <= segment < len(current):
                current = current[segment]
            else:
                return None
            continue
        if isinstance(current, dict):
            current = current.get(segment)
        else:
            current = getattr(current, segment, None)
    return current


class _FinishReasonTrackingStream:
    def __init__(self, stream):
        self._stream = stream
        self._iter = stream.__aiter__()

    def __aiter__(self):
        return self

    async def __anext__(self):
        chunk = await self._iter.__anext__()
        finish_reason = _read_path(chunk, "choices", 0, "finish_reason")
        if finish_reason is not None:
            span = otel_trace.get_current_span()
            if span.is_recording():
                span.set_attribute("llm.finish_reason", str(finish_reason))
        return chunk

    async def aclose(self):
        if hasattr(self._iter, "aclose"):
            await self._iter.aclose()
        elif hasattr(self._stream, "aclose"):
            await self._stream.aclose()

    async def close(self):
        if hasattr(self._stream, "close"):
            await self._stream.close()
        else:
            await self.aclose()


class CovalOpenAILLMService(OpenAILLMService):
    """Adds llm.finish_reason to Pipecat's built-in traced `llm` spans."""

    async def get_chat_completions(self, params_from_context):
        stream = await super().get_chat_completions(params_from_context)
        return _FinishReasonTrackingStream(stream)

For non-Pipecat agents, set the attribute directly on your llm span after the provider response finishes:

with tracer.start_as_current_span("llm") as span:
    response = client.responses.create(...)
    if response.finish_reason:
        span.set_attribute("llm.finish_reason", response.finish_reason)

Common values include stop, length, tool_calls, and content_filter.

Provider Fallback Spans

Many voice agents use a provider fallback chain for STT — for example, Deepgram → Google → Azure. Without per-provider spans, a single stt span only shows the final result; there is no visibility into which provider served the call, how long each attempt took, or why a fallback triggered. The convention is to create one stt.provider.<name> child span per provider attempt, nested inside the parent stt span:

stt                         ← parent span: final result
  └── stt.provider.deepgram ← attempt 1 (succeeded)

Or for a fallback:

stt                         ← parent span: final result
  ├── stt.provider.deepgram ← attempt 1 (failed, span status = ERROR)
  └── stt.provider.google   ← attempt 2 (succeeded)

Span attributes

Attribute	Type	Description
`stt.providerName`	string	Provider name, e.g. `"deepgram"`, `"google"`, `"azure"`
`stt.confidence`	float	ASR confidence score from this provider (0.0–1.0)
`metrics.ttfb`	float	Time to first byte for this provider attempt (seconds)

Code example

import time
from opentelemetry import trace as otel_trace

tracer = otel_trace.get_tracer("my-stt-instrumentation")

def transcribe_with_fallback(audio):
    providers = [("deepgram", deepgram_stt), ("google", google_stt)]
    final_transcript = None

    with tracer.start_as_current_span("stt") as stt_span:
        for provider_name, stt_fn in providers:
            attempt_start = time.time()
            with tracer.start_as_current_span(f"stt.provider.{provider_name}") as provider_span:
                provider_span.set_attribute("stt.providerName", provider_name)
                try:
                    result = stt_fn(audio)
                    ttfb = time.time() - attempt_start
                    provider_span.set_attribute("metrics.ttfb", round(ttfb, 4))
                    confidence = getattr(result, "confidence", None)
                    if confidence is not None:
                        provider_span.set_attribute("stt.confidence", confidence)
                    final_transcript = result.transcript
                    break  # success — stop trying fallbacks
                except Exception as e:
                    provider_span.set_status(otel_trace.StatusCode.ERROR, str(e))

        if final_transcript:
            stt_span.set_attribute("transcript", final_transcript)

    return final_transcript

Viewing Traces in Coval

After a simulation completes or conversation traces are received, an OTel Traces card automatically appears in the metric grid on the result page when trace data is available. The card shows the total span count and a View Traces button that navigates directly to the trace viewer. To view traces: open a run or conversation result, click into a result, and click the OTel Traces card. You can also navigate directly via URL:

https://app.coval.dev/<org-slug>/runs/<run-id>/results/<simulation-output-id>/traces

Traces appear within a few seconds of the simulation completing or being submitted.

Trace viewer features

The trace viewer has two visualization modes you can switch between using the toggle in the header: Waterfall view — Shows spans as horizontal bars on a timeline, nested by parent-child relationships. Use the collapse/expand controls to focus on specific parts of the call hierarchy. You can filter by span type using the color-coded legend pills in the header. Flame graph view — Shows all spans stacked by depth, giving a birds-eye view of where time is spent. Interactions include:

Scroll to pan the timeline left/right
Ctrl/Cmd + scroll to zoom in and out
Drag-select a region to zoom into that time range
Double-click a span to zoom to fit that span’s duration
Press F to reset the view to fit the full trace
A mini-map above the flame graph shows the full trace with your current viewport highlighted — drag it to pan quickly

In both views, clicking any span opens a detail panel on the right showing the span’s attributes, timing, status, and parent chain. When no span is selected, the detail panel shows a trace summary with total spans, duration, span type breakdown with time percentages, slowest spans, and any error spans.

Transition Hotspots

Transition Hotspots give you a run-level view of how conversations flow through your agent’s states — and where they fail. Rather than inspecting individual simulations one by one, you can see the full distribution of state-to-state transitions across an entire run at a glance.

Walkthrough

Accessing Transition Hotspots

The Hotspots tab appears on the run results page when at least one simulation in the run has OTel trace data. Navigate to a run, then click the Hotspots tab. If the tab is not visible, the run does not contain any traced simulations. You can also access it directly via the ?view=hotspots query parameter on the run results URL.

Reading the Heatmap

The Hotspots view displays a heatmap matrix where:

Rows represent the origin state of a transition (the “from” state)
Columns represent the destination state (the “to” state)
Each cell represents a pair of states — for example, “greeting → account_lookup”

Toggle between two views using the buttons in the header:

View	Description
Counts	Each cell shows how many times that state-to-state transition occurred across all simulations in the run
Failure Rate	Each cell shows the percentage of simulations that failed when hitting that transition

Darker cells indicate higher counts or higher failure rates, depending on the active view.

Drilling Down

Click any cell in the heatmap to open a detail panel showing:

The total count and failure count for that transition
Exemplar simulations — individual simulations that passed through that state transition, with direct links to review them

Use exemplars to understand why a particular transition has a high failure rate: open a failing simulation and inspect the transcript and trace together. The Top Hotspots sidebar ranks state transitions by failure count, making it easy to find the most impactful problems without scanning the full matrix. The top-ranked transitions are the ones where the most simulations failed.

Span Filters

Use the span type filters to include or exclude specific span types from the transition analysis. Wrapper spans — such as conversation, pipeline, transport, and session:* spans — are automatically collapsed and filtered by default, so the heatmap focuses on the meaningful transitions within your agent’s processing logic.

Start with the Failure Rate view to find which transitions are most problematic, then switch to Counts to understand the volume. A transition with a 100% failure rate but only 1 occurrence is less concerning than one with a 30% failure rate across 50 simulations.

Full Example

from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

# Configure tracer
resource = Resource.create({SERVICE_NAME: "my-agent"})
provider = trace_sdk.TracerProvider(resource=resource)

exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Simulation-Id": "<simulation_id>",
    },
    timeout=30,
)

provider.add_span_processor(SimpleSpanProcessor(exporter))
tracer = provider.get_tracer("my-agent")

# Use tracer in agent code
with tracer.start_as_current_span("llm_tool_call") as span:
    span.set_attribute("function.name", "search_database")
    span.set_attribute("tool_call_id", "call_123")
    result = call_tool()

# Call on agent exit for clean resource teardown.
provider.shutdown()

Using Span Attributes in Custom Metrics

Any numeric span attribute your agent emits can be measured using a Custom Trace Metric (METRIC_CUSTOM_TRACE). This lets you track latency, token counts, or any other numeric value from your traces without writing custom evaluation code. To create a custom trace metric, specify:

Span Name — the span_name of the spans to aggregate (e.g. llm, tts, or any custom span you create)
Metric Attribute — the span attribute key containing the numeric value (e.g. metrics.ttfb, token_count)
Aggregation Method — how to aggregate across turns: average, median, p90, max, or min

See Create Metric for the full API reference.

Introduction

Configuration

Observability

Step-by-Step Guides

Use Cases

Connect & Collaborate

Prerequisites

Configuration

Getting the Simulation Output ID

Inbound voice agents

Outbound voice agents

Tracing for Conversations

Full conversation tracing example

Uploading Traces via the Dashboard

Payload Limits & Batching

Splitting spans across requests

Instrumenting Your Agent

Span Naming Conventions

Instrumenting STT Spans

Instrumenting LLM Spans

Provider Fallback Spans

Span attributes

Code example

Viewing Traces in Coval

Trace viewer features

Transition Hotspots

Walkthrough

Accessing Transition Hotspots

Reading the Heatmap

Drilling Down

Top Hotspots Sidebar

Span Filters

Full Example

Using Span Attributes in Custom Metrics

Introduction

Configuration

Observability

Step-by-Step Guides

Use Cases

Connect & Collaborate

Documentation Index

​Prerequisites

​Configuration

​Getting the Simulation Output ID

​Inbound voice agents

​Outbound voice agents

​Tracing for Conversations

​Full conversation tracing example

​Uploading Traces via the Dashboard

​Payload Limits & Batching

​Splitting spans across requests

​Instrumenting Your Agent

​Span Naming Conventions

​Instrumenting STT Spans

​Instrumenting LLM Spans

​Provider Fallback Spans

​Span attributes

​Code example

​Viewing Traces in Coval

​Trace viewer features

​Transition Hotspots

​Walkthrough

​Accessing Transition Hotspots

​Reading the Heatmap

​Drilling Down

​Top Hotspots Sidebar

​Span Filters

​Full Example

​Using Span Attributes in Custom Metrics

Prerequisites

Configuration

Getting the Simulation Output ID

Inbound voice agents

Outbound voice agents

Tracing for Conversations

Full conversation tracing example

Uploading Traces via the Dashboard

Payload Limits & Batching

Splitting spans across requests

Instrumenting Your Agent

Span Naming Conventions

Instrumenting STT Spans

Instrumenting LLM Spans

Provider Fallback Spans

Span attributes

Code example

Viewing Traces in Coval

Trace viewer features

Transition Hotspots

Walkthrough

Accessing Transition Hotspots

Reading the Heatmap

Drilling Down

Top Hotspots Sidebar

Span Filters

Full Example

Using Span Attributes in Custom Metrics