Skip to main content
Beta Feature — Tracing with OpenTelemetry is currently in beta and under active development. Functionality and APIs may change as we continue to improve the experience.
You can send traces from your agent to Coval using the OpenTelemetry SDK. This lets you capture detailed span data — such as tool calls, LLM invocations, and other operations — and export it directly to Coval for analysis alongside your simulation or monitoring results. Tracing works for both simulations (where Coval calls your agent) and monitoring (where you submit post-hoc call data). The setup differs only in how you identify the call — everything else (instrumentation, span naming, viewing) is the same.
New to tracing? If you’re using Pipecat, LiveKit, or Vapi, the Coval Wizard (Beta) can instrument your agent automatically with one command: npx @coval/wizard

Prerequisites

  • A Coval account with an API key (manage your keys)
  • A simulation output ID (for simulations) or a conversation ID (for monitoring)
  • Python 3.8+ with the OpenTelemetry SDK installed
Install the required packages:
pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http

Configuration

Configure the OpenTelemetry tracer provider to export spans to Coval’s trace ingestion endpoint:
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

# Configure tracer
resource = Resource.create({SERVICE_NAME: "my-agent"})
provider = trace_sdk.TracerProvider(resource=resource)

exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Simulation-Id": "<simulation_id>",
    },
    timeout=30,
)

provider.add_span_processor(SimpleSpanProcessor(exporter))
tracer = provider.get_tracer("my-agent")
ParameterDescription
endpointCoval’s OTLP trace ingestion URL: https://api.coval.dev/v1/traces
X-API-KeyYour Coval API key
X-Simulation-IdThe simulation output ID for the individual call being traced. This is per-simulation-call, not the run ID.
timeoutExport timeout in seconds. Must be set to 30 (see note below)
SERVICE_NAMEA name identifying your agent service
The timeout parameter must be set to 30 seconds to ensure spans are exported reliably. We are working on reducing this requirement in a future update.

Getting the Simulation Output ID

The X-Simulation-Id header must be set to the simulation output ID for the specific call you’re tracing. The simulation output ID is a per-call identifier — different from the run ID. Here’s how to obtain it at runtime.

Inbound voice agents

When Coval places an inbound call, it passes the simulation output ID as a SIP header: X-Coval-Simulation-Id. Read this header when the call arrives and use it to configure your OTLP exporter.
# Example: reading the simulation output ID from a SIP header
# In your call.initiated webhook handler (Telnyx example):
simulation_id = next(
    h["value"] for h in event["payload"]["sip_headers"]
    if h["name"] == "X-Coval-Simulation-Id"
)

exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Simulation-Id": simulation_id,
    },
    timeout=30,
)
See the Inbound Voice guide for provider-specific instructions on reading SIP headers (Twilio, Telnyx, etc.).

Outbound voice agents

Coval’s outbound trigger POST can include the simulation output ID in the request payload. Add simulation_output_id to your trigger_call_payload configuration in your template, then read it when your webhook receives the trigger and use it to configure the exporter.
You can also find simulation output IDs in the Coval dashboard under any run’s results, or via the Coval API.

Tracing for Monitoring Calls

For monitoring (post-hoc call evaluation), there is no Coval-initiated call, so there is no simulation output ID available at call time. Instead, you use a conversation ID to associate traces with a monitoring conversation. The conversation ID is only available after the call ends and you submit the transcript to Coval — which means you can’t configure the OTLP exporter up front. The solution is to buffer spans in memory during the call, then flush them once you have the ID.
1

Buffer spans during the call

Use InMemorySpanExporter (included in opentelemetry-sdk) to hold spans locally during the call instead of exporting them in real time.
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

resource = Resource.create({SERVICE_NAME: "my-agent"})
in_memory_exporter = InMemorySpanExporter()
provider = trace_sdk.TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(in_memory_exporter))
tracer = provider.get_tracer("my-agent")

# Instrument your agent as normal — spans accumulate in memory
with tracer.start_as_current_span("llm") as span:
    span.set_attribute("metrics.ttfb", 0.42)
    response = call_llm()
2

Submit the conversation after the call ends

Post the transcript (and optionally audio) to POST /v1/conversations:submit. The response contains the conversation_id you need for trace export.
import requests

response = requests.post(
    "https://api.coval.dev/v1/conversations:submit",
    headers={
        "x-api-key": "<COVAL_API_KEY>",
        "Content-Type": "application/json",
    },
    json={
        "transcript": [
            {"role": "user", "content": "Hello", "start_time": 0.0, "end_time": 1.2},
            {"role": "assistant", "content": "Hi! How can I help?", "start_time": 1.5, "end_time": 3.0},
        ],
    },
)
conversation_id = response.json()["conversation"]["conversation_id"]
See POST /v1/conversations:submit for the full request schema including optional audio, metadata, and metrics fields.
3

Export the buffered spans

Create an OTLP exporter with X-Conversation-Id and flush the buffered spans to Coval.
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter

otlp_exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Conversation-Id": conversation_id,
    },
    timeout=30,
)

finished_spans = in_memory_exporter.get_finished_spans()
if finished_spans:
    otlp_exporter.export(list(finished_spans))
ParameterDescription
X-Conversation-IdThe conversation_id returned by POST /v1/conversations:submit. Use this instead of X-Simulation-Id.
Traces can be sent immediately after submitting a conversation — no delay is needed.

Full monitoring example

import requests
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

COVAL_API_KEY = "<COVAL_API_KEY>"

# --- Call setup: buffer spans in memory ---
resource = Resource.create({SERVICE_NAME: "my-agent"})
in_memory_exporter = InMemorySpanExporter()
provider = trace_sdk.TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(in_memory_exporter))
tracer = provider.get_tracer("my-agent")

# --- During the call: instrument as normal ---
with tracer.start_as_current_span("llm") as span:
    span.set_attribute("metrics.ttfb", 0.42)
    response = call_llm()

with tracer.start_as_current_span("tts") as span:
    span.set_attribute("metrics.ttfb", 0.18)
    audio = synthesize_speech(response)

# --- After the call ends: submit transcript, then export spans ---
submit_response = requests.post(
    "https://api.coval.dev/v1/conversations:submit",
    headers={"x-api-key": COVAL_API_KEY, "Content-Type": "application/json"},
    json={"transcript": transcript},
)
conversation_id = submit_response.json()["conversation"]["conversation_id"]

otlp_exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={"X-API-Key": COVAL_API_KEY, "X-Conversation-Id": conversation_id},
    timeout=30,
)
finished_spans = in_memory_exporter.get_finished_spans()
if finished_spans:
    otlp_exporter.export(list(finished_spans))

Uploading Traces via the Dashboard

You can also upload traces directly from the Coval dashboard without using the SDK. In the Monitoring page, click Upload to Monitoring and:
  1. Add your audio file or transcript as usual
  2. In the Traces (Optional) section, select your OTLP traces JSON file (must contain a resourceSpans array)
  3. Click Upload — the conversation and traces are submitted together
This is useful for testing, debugging, or uploading historical traces that were captured separately.

Instrumenting Your Agent

Once the tracer is configured, wrap operations in spans to capture trace data:
# Use tracer in agent code
with tracer.start_as_current_span("tool_call") as span:
    span.set_attribute("tool.name", "search_database")
    result = call_tool()
You can nest spans to capture the full call hierarchy of your agent — for example, a parent span for the overall request and child spans for individual tool calls or LLM invocations.
Shutdown — Call provider.shutdown() when your agent exits. With SimpleSpanProcessor, spans are exported synchronously as each span ends (not buffered), so they are already in Coval before shutdown is called. Shutdown is still good practice for clean resource teardown.
# Call on agent exit for clean resource teardown.
provider.shutdown()

Span Naming Conventions

Coval’s trace viewer applies semantic colors and labels to well-known span names. Using these names gives a richer experience in the UI.
Span NameUse ForKey Attributes
llmLLM invocationsmetrics.ttfb (seconds), gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, llm.finish_reason (stop, tool_calls, length, content_filter)
ttsText-to-Speechmetrics.ttfb (seconds)
sttSpeech-to-Textmetrics.ttfb (seconds), transcript (transcribed text — required for STT Word Error Rate and Audio Upload variant), stt.confidence (ASR confidence 0.0–1.0)
stt.provider.<name>Per-provider STT attempt (child of stt)stt.providerName, stt.confidence, metrics.ttfb
vadVoice Activity Detection
llm_tool_callIndividual tool/function callsfunction.name, tool_call_id, function.arguments
turnA single conversation turn
conversationFull conversation
pipelineProcessing pipeline
transportAudio/network transport
Any span name works — spans with names not listed above will still appear in the UI with auto-assigned colors. Use service.name in your Resource to group spans by service.
For complete working implementations, see the voice agent examples on GitHub — Vapi, Pipecat, and LiveKit agents that emit the full span schema.

Instrumenting STT Spans

To use the STT Word Error Rate metric (or its Audio Upload variant), your agent must emit stt spans with a transcript attribute containing the transcribed text. This is what allows Coval to compare your agent’s STT output against a reference transcript. We also recommend attaching stt.confidence when your STT provider exposes a per-utterance confidence score. Here is an example using the Pipecat framework:
from opentelemetry import trace as otel_trace
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.utils.tracing.service_decorators import traced_stt


def _read_path(value, *path):
    current = value
    for segment in path:
        if current is None:
            return None
        if isinstance(segment, int):
            if isinstance(current, (list, tuple)) and 0 <= segment < len(current):
                current = current[segment]
            else:
                return None
            continue
        if isinstance(current, dict):
            current = current.get(segment)
        else:
            current = getattr(current, segment, None)
    return current


def extract_stt_confidence(result):
    confidence = _read_path(result, "channel", "alternatives", 0, "confidence")
    if confidence is None:
        return None
    normalized = float(confidence)
    if 0.0 <= normalized <= 1.0:
        return round(normalized, 4)
    return None


class CovalDeepgramSTTService(DeepgramSTTService):
    """Adds stt.confidence to Pipecat's built-in traced `stt` spans."""

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._current_stt_confidence = None

    async def _on_message(self, *args, **kwargs):
        result = kwargs.get("result")
        is_final = bool(getattr(result, "is_final", False)) if result else False
        self._current_stt_confidence = extract_stt_confidence(result) if is_final else None
        try:
            await super()._on_message(*args, **kwargs)
        finally:
            if is_final:
                self._current_stt_confidence = None

    @traced_stt
    async def _handle_transcription(self, transcript, is_final, language=None):
        if is_final and self._current_stt_confidence is not None:
            span = otel_trace.get_current_span()
            if span.is_recording():
                span.set_attribute("stt.confidence", self._current_stt_confidence)
Instantiate the subclass in your pipeline. With PipelineTask(..., enable_tracing=True), Pipecat still emits the standard stt span, and the subclass adds stt.confidence onto that same span:
stt = CovalDeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))

pipeline = Pipeline([
    transport.input(),
    stt,
    context_aggregator.user(),
    llm,
    tts,
    transport.output(),
])
For non-Pipecat agents, emit equivalent spans wherever your STT returns final transcriptions:
from opentelemetry import trace as otel_trace

tracer = otel_trace.get_tracer("my-stt-instrumentation")

with tracer.start_as_current_span("stt") as span:
    span.set_attribute("transcript", transcription_text)
    if confidence_score is not None:
        span.set_attribute("stt.confidence", confidence_score)
The span must be named "stt" and include the transcript attribute with the transcribed text. stt.confidence is optional, but when present it should be a 0.0-1.0 score for the final utterance.

Instrumenting LLM Spans

Include llm.finish_reason on llm spans so you can tell why the model stopped generating. This is especially useful when debugging responses that were silently cut off because llm.finish_reason=length. Here is a Pipecat example that enriches the built-in traced llm span:
from opentelemetry import trace as otel_trace
from pipecat.services.openai.llm import OpenAILLMService


def _read_path(value, *path):
    current = value
    for segment in path:
        if current is None:
            return None
        if isinstance(segment, int):
            if isinstance(current, (list, tuple)) and 0 <= segment < len(current):
                current = current[segment]
            else:
                return None
            continue
        if isinstance(current, dict):
            current = current.get(segment)
        else:
            current = getattr(current, segment, None)
    return current


class _FinishReasonTrackingStream:
    def __init__(self, stream):
        self._stream = stream
        self._iter = stream.__aiter__()

    def __aiter__(self):
        return self

    async def __anext__(self):
        chunk = await self._iter.__anext__()
        finish_reason = _read_path(chunk, "choices", 0, "finish_reason")
        if finish_reason is not None:
            span = otel_trace.get_current_span()
            if span.is_recording():
                span.set_attribute("llm.finish_reason", str(finish_reason))
        return chunk

    async def aclose(self):
        if hasattr(self._iter, "aclose"):
            await self._iter.aclose()
        elif hasattr(self._stream, "aclose"):
            await self._stream.aclose()

    async def close(self):
        if hasattr(self._stream, "close"):
            await self._stream.close()
        else:
            await self.aclose()


class CovalOpenAILLMService(OpenAILLMService):
    """Adds llm.finish_reason to Pipecat's built-in traced `llm` spans."""

    async def get_chat_completions(self, params_from_context):
        stream = await super().get_chat_completions(params_from_context)
        return _FinishReasonTrackingStream(stream)
For non-Pipecat agents, set the attribute directly on your llm span after the provider response finishes:
with tracer.start_as_current_span("llm") as span:
    response = client.responses.create(...)
    if response.finish_reason:
        span.set_attribute("llm.finish_reason", response.finish_reason)
Common values include stop, length, tool_calls, and content_filter.

Provider Fallback Spans

Many voice agents use a provider fallback chain for STT — for example, Deepgram → Google → Azure. Without per-provider spans, a single stt span only shows the final result; there is no visibility into which provider served the call, how long each attempt took, or why a fallback triggered. The convention is to create one stt.provider.<name> child span per provider attempt, nested inside the parent stt span:
stt                         ← parent span: final result
  └── stt.provider.deepgram ← attempt 1 (succeeded)
Or for a fallback:
stt                         ← parent span: final result
  ├── stt.provider.deepgram ← attempt 1 (failed, span status = ERROR)
  └── stt.provider.google   ← attempt 2 (succeeded)

Span attributes

AttributeTypeDescription
stt.providerNamestringProvider name, e.g. "deepgram", "google", "azure"
stt.confidencefloatASR confidence score from this provider (0.0–1.0)
metrics.ttfbfloatTime to first byte for this provider attempt (seconds)

Code example

import time
from opentelemetry import trace as otel_trace

tracer = otel_trace.get_tracer("my-stt-instrumentation")

def transcribe_with_fallback(audio):
    providers = [("deepgram", deepgram_stt), ("google", google_stt)]
    final_transcript = None

    with tracer.start_as_current_span("stt") as stt_span:
        for provider_name, stt_fn in providers:
            attempt_start = time.time()
            with tracer.start_as_current_span(f"stt.provider.{provider_name}") as provider_span:
                provider_span.set_attribute("stt.providerName", provider_name)
                try:
                    result = stt_fn(audio)
                    ttfb = time.time() - attempt_start
                    provider_span.set_attribute("metrics.ttfb", round(ttfb, 4))
                    confidence = getattr(result, "confidence", None)
                    if confidence is not None:
                        provider_span.set_attribute("stt.confidence", confidence)
                    final_transcript = result.transcript
                    break  # success — stop trying fallbacks
                except Exception as e:
                    provider_span.set_status(otel_trace.StatusCode.ERROR, str(e))

        if final_transcript:
            stt_span.set_attribute("transcript", final_transcript)

    return final_transcript

Viewing Traces in Coval

After a simulation completes or monitoring traces are received, an OTel Traces card automatically appears in the metric grid on the result page when trace data is available. The card shows the total span count and a View Traces button that navigates directly to the trace viewer. To view traces: open a run or monitoring result, click into a result, and click the OTel Traces card. You can also navigate directly via URL:
https://app.coval.dev/<org-slug>/runs/<run-id>/results/<simulation-output-id>/traces
Traces appear within a few seconds of the simulation completing or being submitted.

Trace viewer features

The trace viewer has two visualization modes you can switch between using the toggle in the header: Waterfall view — Shows spans as horizontal bars on a timeline, nested by parent-child relationships. Use the collapse/expand controls to focus on specific parts of the call hierarchy. You can filter by span type using the color-coded legend pills in the header. Flame graph view — Shows all spans stacked by depth, giving a birds-eye view of where time is spent. Interactions include:
  • Scroll to pan the timeline left/right
  • Ctrl/Cmd + scroll to zoom in and out
  • Drag-select a region to zoom into that time range
  • Double-click a span to zoom to fit that span’s duration
  • Press F to reset the view to fit the full trace
  • A mini-map above the flame graph shows the full trace with your current viewport highlighted — drag it to pan quickly
In both views, clicking any span opens a detail panel on the right showing the span’s attributes, timing, status, and parent chain. When no span is selected, the detail panel shows a trace summary with total spans, duration, span type breakdown with time percentages, slowest spans, and any error spans.

Transition Hotspots

Transition Hotspots give you a run-level view of how conversations flow through your agent’s states — and where they fail. Rather than inspecting individual simulations one by one, you can see the full distribution of state-to-state transitions across an entire run at a glance.

Walkthrough

Accessing Transition Hotspots

The Hotspots tab appears on the run results page when at least one simulation in the run has OTel trace data. Navigate to a run, then click the Hotspots tab. If the tab is not visible, the run does not contain any traced simulations. You can also access it directly via the ?view=hotspots query parameter on the run results URL.

Reading the Heatmap

The Hotspots view displays a heatmap matrix where:
  • Rows represent the origin state of a transition (the “from” state)
  • Columns represent the destination state (the “to” state)
  • Each cell represents a pair of states — for example, “greeting → account_lookup”
Toggle between two views using the buttons in the header:
ViewDescription
CountsEach cell shows how many times that state-to-state transition occurred across all simulations in the run
Failure RateEach cell shows the percentage of simulations that failed when hitting that transition
Darker cells indicate higher counts or higher failure rates, depending on the active view.

Drilling Down

Click any cell in the heatmap to open a detail panel showing:
  • The total count and failure count for that transition
  • Exemplar simulations — individual simulations that passed through that state transition, with direct links to review them
Use exemplars to understand why a particular transition has a high failure rate: open a failing simulation and inspect the transcript and trace together.

Top Hotspots Sidebar

The Top Hotspots sidebar ranks state transitions by failure count, making it easy to find the most impactful problems without scanning the full matrix. The top-ranked transitions are the ones where the most simulations failed.

Span Filters

Use the span type filters to include or exclude specific span types from the transition analysis. Wrapper spans — such as conversation, pipeline, transport, and session:* spans — are automatically collapsed and filtered by default, so the heatmap focuses on the meaningful transitions within your agent’s processing logic.
Start with the Failure Rate view to find which transitions are most problematic, then switch to Counts to understand the volume. A transition with a 100% failure rate but only 1 occurrence is less concerning than one with a 30% failure rate across 50 simulations.

Full Example

from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource

# Configure tracer
resource = Resource.create({SERVICE_NAME: "my-agent"})
provider = trace_sdk.TracerProvider(resource=resource)

exporter = OTLPSpanExporter(
    endpoint="https://api.coval.dev/v1/traces",
    headers={
        "X-API-Key": "<COVAL_API_KEY>",
        "X-Simulation-Id": "<simulation_id>",
    },
    timeout=30,
)

provider.add_span_processor(SimpleSpanProcessor(exporter))
tracer = provider.get_tracer("my-agent")

# Use tracer in agent code
with tracer.start_as_current_span("tool_call") as span:
    span.set_attribute("tool.name", "search_database")
    result = call_tool()

# Call on agent exit for clean resource teardown.
provider.shutdown()

Using Span Attributes in Custom Metrics

Any numeric span attribute your agent emits can be measured using a Custom Trace Metric (METRIC_CUSTOM_TRACE). This lets you track latency, token counts, or any other numeric value from your traces without writing custom evaluation code. To create a custom trace metric, specify:
  • Span Name — the span_name of the spans to aggregate (e.g. llm, tts, or any custom span you create)
  • Metric Attribute — the span attribute key containing the numeric value (e.g. metrics.ttfb, token_count)
  • Aggregation Method — how to aggregate across turns: average, median, p90, max, or min
See Create Metric for the full API reference.