Beta Feature — Tracing with OpenTelemetry is currently in beta and under active development. Functionality and APIs may change as we continue to improve the experience.
You can send traces from your agent to Coval using the OpenTelemetry SDK. This lets you capture detailed span data — such as tool calls, LLM invocations, and other operations — and export it directly to Coval for analysis alongside your simulation or monitoring results.
Tracing works for both simulations (where Coval calls your agent) and monitoring (where you submit post-hoc call data). The setup differs only in how you identify the call — everything else (instrumentation, span naming, viewing) is the same.
New to tracing? If you’re using Pipecat, LiveKit, or Vapi, the Coval Wizard (Beta) can instrument your agent automatically with one command: npx @coval/wizard
Prerequisites
- A Coval account with an API key (manage your keys)
- A simulation output ID (for simulations) or a conversation ID (for monitoring)
- Python 3.8+ with the OpenTelemetry SDK installed
Install the required packages:
pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http
Configuration
Configure the OpenTelemetry tracer provider to export spans to Coval’s trace ingestion endpoint:
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
# Configure tracer
resource = Resource.create({SERVICE_NAME: "my-agent"})
provider = trace_sdk.TracerProvider(resource=resource)
exporter = OTLPSpanExporter(
endpoint="https://api.coval.dev/v1/traces",
headers={
"X-API-Key": "<COVAL_API_KEY>",
"X-Simulation-Id": "<simulation_id>",
},
timeout=30,
)
provider.add_span_processor(SimpleSpanProcessor(exporter))
tracer = provider.get_tracer("my-agent")
| Parameter | Description |
|---|
endpoint | Coval’s OTLP trace ingestion URL: https://api.coval.dev/v1/traces |
X-API-Key | Your Coval API key |
X-Simulation-Id | The simulation output ID for the individual call being traced. This is per-simulation-call, not the run ID. |
timeout | Export timeout in seconds. Must be set to 30 (see note below) |
SERVICE_NAME | A name identifying your agent service |
The timeout parameter must be set to 30 seconds to ensure spans are exported reliably. We are working on reducing this requirement in a future update.
Getting the Simulation Output ID
The X-Simulation-Id header must be set to the simulation output ID for the specific call you’re tracing. The simulation output ID is a per-call identifier — different from the run ID. Here’s how to obtain it at runtime.
Inbound voice agents
When Coval places an inbound call, it passes the simulation output ID as a SIP header: X-Coval-Simulation-Id. Read this header when the call arrives and use it to configure your OTLP exporter.
# Example: reading the simulation output ID from a SIP header
# In your call.initiated webhook handler (Telnyx example):
simulation_id = next(
h["value"] for h in event["payload"]["sip_headers"]
if h["name"] == "X-Coval-Simulation-Id"
)
exporter = OTLPSpanExporter(
endpoint="https://api.coval.dev/v1/traces",
headers={
"X-API-Key": "<COVAL_API_KEY>",
"X-Simulation-Id": simulation_id,
},
timeout=30,
)
See the Inbound Voice guide for provider-specific instructions on reading SIP headers (Twilio, Telnyx, etc.).
Outbound voice agents
Coval’s outbound trigger POST can include the simulation output ID in the request payload. Add simulation_output_id to your trigger_call_payload configuration in your template, then read it when your webhook receives the trigger and use it to configure the exporter.
You can also find simulation output IDs in the Coval dashboard under any run’s results, or via the Coval API.
Tracing for Monitoring Calls
For monitoring (post-hoc call evaluation), there is no Coval-initiated call, so there is no simulation output ID available at call time. Instead, you use a conversation ID to associate traces with a monitoring conversation.
The conversation ID is only available after the call ends and you submit the transcript to Coval — which means you can’t configure the OTLP exporter up front. The solution is to buffer spans in memory during the call, then flush them once you have the ID.
Buffer spans during the call
Use InMemorySpanExporter (included in opentelemetry-sdk) to hold spans locally during the call instead of exporting them in real time.from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
resource = Resource.create({SERVICE_NAME: "my-agent"})
in_memory_exporter = InMemorySpanExporter()
provider = trace_sdk.TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(in_memory_exporter))
tracer = provider.get_tracer("my-agent")
# Instrument your agent as normal — spans accumulate in memory
with tracer.start_as_current_span("llm") as span:
span.set_attribute("metrics.ttfb", 0.42)
response = call_llm()
Submit the conversation after the call ends
Post the transcript (and optionally audio) to POST /v1/conversations:submit. The response contains the conversation_id you need for trace export.import requests
response = requests.post(
"https://api.coval.dev/v1/conversations:submit",
headers={
"x-api-key": "<COVAL_API_KEY>",
"Content-Type": "application/json",
},
json={
"transcript": [
{"role": "user", "content": "Hello", "start_time": 0.0, "end_time": 1.2},
{"role": "assistant", "content": "Hi! How can I help?", "start_time": 1.5, "end_time": 3.0},
],
},
)
conversation_id = response.json()["conversation"]["conversation_id"]
See POST /v1/conversations:submit for the full request schema including optional audio, metadata, and metrics fields. Export the buffered spans
Create an OTLP exporter with X-Conversation-Id and flush the buffered spans to Coval.from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
otlp_exporter = OTLPSpanExporter(
endpoint="https://api.coval.dev/v1/traces",
headers={
"X-API-Key": "<COVAL_API_KEY>",
"X-Conversation-Id": conversation_id,
},
timeout=30,
)
finished_spans = in_memory_exporter.get_finished_spans()
if finished_spans:
otlp_exporter.export(list(finished_spans))
| Parameter | Description |
|---|
X-Conversation-Id | The conversation_id returned by POST /v1/conversations:submit. Use this instead of X-Simulation-Id. |
Traces can be sent immediately after submitting a conversation — no delay is needed.
Full monitoring example
import requests
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.trace.export.in_memory_span_exporter import InMemorySpanExporter
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
COVAL_API_KEY = "<COVAL_API_KEY>"
# --- Call setup: buffer spans in memory ---
resource = Resource.create({SERVICE_NAME: "my-agent"})
in_memory_exporter = InMemorySpanExporter()
provider = trace_sdk.TracerProvider(resource=resource)
provider.add_span_processor(SimpleSpanProcessor(in_memory_exporter))
tracer = provider.get_tracer("my-agent")
# --- During the call: instrument as normal ---
with tracer.start_as_current_span("llm") as span:
span.set_attribute("metrics.ttfb", 0.42)
response = call_llm()
with tracer.start_as_current_span("tts") as span:
span.set_attribute("metrics.ttfb", 0.18)
audio = synthesize_speech(response)
# --- After the call ends: submit transcript, then export spans ---
submit_response = requests.post(
"https://api.coval.dev/v1/conversations:submit",
headers={"x-api-key": COVAL_API_KEY, "Content-Type": "application/json"},
json={"transcript": transcript},
)
conversation_id = submit_response.json()["conversation"]["conversation_id"]
otlp_exporter = OTLPSpanExporter(
endpoint="https://api.coval.dev/v1/traces",
headers={"X-API-Key": COVAL_API_KEY, "X-Conversation-Id": conversation_id},
timeout=30,
)
finished_spans = in_memory_exporter.get_finished_spans()
if finished_spans:
otlp_exporter.export(list(finished_spans))
Uploading Traces via the Dashboard
You can also upload traces directly from the Coval dashboard without using the SDK. In the Monitoring page, click Upload to Monitoring and:
- Add your audio file or transcript as usual
- In the Traces (Optional) section, select your OTLP traces JSON file (must contain a
resourceSpans array)
- Click Upload — the conversation and traces are submitted together
This is useful for testing, debugging, or uploading historical traces that were captured separately.
Instrumenting Your Agent
Once the tracer is configured, wrap operations in spans to capture trace data:
# Use tracer in agent code
with tracer.start_as_current_span("tool_call") as span:
span.set_attribute("tool.name", "search_database")
result = call_tool()
You can nest spans to capture the full call hierarchy of your agent — for example, a parent span for the overall request and child spans for individual tool calls or LLM invocations.
Shutdown — Call provider.shutdown() when your agent exits. With SimpleSpanProcessor, spans are exported synchronously as each span ends (not buffered), so they are already in Coval before shutdown is called. Shutdown is still good practice for clean resource teardown.
# Call on agent exit for clean resource teardown.
provider.shutdown()
Span Naming Conventions
Coval’s trace viewer applies semantic colors and labels to well-known span names. Using these names gives a richer experience in the UI.
| Span Name | Use For | Key Attributes |
|---|
llm | LLM invocations | metrics.ttfb (seconds), gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, llm.finish_reason (stop, tool_calls, length, content_filter) |
tts | Text-to-Speech | metrics.ttfb (seconds) |
stt | Speech-to-Text | metrics.ttfb (seconds), transcript (transcribed text — required for STT Word Error Rate and Audio Upload variant), stt.confidence (ASR confidence 0.0–1.0) |
stt.provider.<name> | Per-provider STT attempt (child of stt) | stt.providerName, stt.confidence, metrics.ttfb |
vad | Voice Activity Detection | — |
llm_tool_call | Individual tool/function calls | function.name, tool_call_id, function.arguments |
turn | A single conversation turn | — |
conversation | Full conversation | — |
pipeline | Processing pipeline | — |
transport | Audio/network transport | — |
Any span name works — spans with names not listed above will still appear in the UI with auto-assigned colors. Use service.name in your Resource to group spans by service.
For complete working implementations, see the voice agent examples on GitHub — Vapi, Pipecat, and LiveKit agents that emit the full span schema.
Instrumenting STT Spans
To use the STT Word Error Rate metric (or its Audio Upload variant), your agent must emit stt spans with a transcript attribute containing the transcribed text. This is what allows Coval to compare your agent’s STT output against a reference transcript. We also recommend attaching stt.confidence when your STT provider exposes a per-utterance confidence score.
Here is an example using the Pipecat framework:
from opentelemetry import trace as otel_trace
from pipecat.services.deepgram.stt import DeepgramSTTService
from pipecat.utils.tracing.service_decorators import traced_stt
def _read_path(value, *path):
current = value
for segment in path:
if current is None:
return None
if isinstance(segment, int):
if isinstance(current, (list, tuple)) and 0 <= segment < len(current):
current = current[segment]
else:
return None
continue
if isinstance(current, dict):
current = current.get(segment)
else:
current = getattr(current, segment, None)
return current
def extract_stt_confidence(result):
confidence = _read_path(result, "channel", "alternatives", 0, "confidence")
if confidence is None:
return None
normalized = float(confidence)
if 0.0 <= normalized <= 1.0:
return round(normalized, 4)
return None
class CovalDeepgramSTTService(DeepgramSTTService):
"""Adds stt.confidence to Pipecat's built-in traced `stt` spans."""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self._current_stt_confidence = None
async def _on_message(self, *args, **kwargs):
result = kwargs.get("result")
is_final = bool(getattr(result, "is_final", False)) if result else False
self._current_stt_confidence = extract_stt_confidence(result) if is_final else None
try:
await super()._on_message(*args, **kwargs)
finally:
if is_final:
self._current_stt_confidence = None
@traced_stt
async def _handle_transcription(self, transcript, is_final, language=None):
if is_final and self._current_stt_confidence is not None:
span = otel_trace.get_current_span()
if span.is_recording():
span.set_attribute("stt.confidence", self._current_stt_confidence)
Instantiate the subclass in your pipeline. With PipelineTask(..., enable_tracing=True), Pipecat still emits the standard stt span, and the subclass adds stt.confidence onto that same span:
stt = CovalDeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
pipeline = Pipeline([
transport.input(),
stt,
context_aggregator.user(),
llm,
tts,
transport.output(),
])
For non-Pipecat agents, emit equivalent spans wherever your STT returns final transcriptions:
from opentelemetry import trace as otel_trace
tracer = otel_trace.get_tracer("my-stt-instrumentation")
with tracer.start_as_current_span("stt") as span:
span.set_attribute("transcript", transcription_text)
if confidence_score is not None:
span.set_attribute("stt.confidence", confidence_score)
The span must be named "stt" and include the transcript attribute with the transcribed text. stt.confidence is optional, but when present it should be a 0.0-1.0 score for the final utterance.
Instrumenting LLM Spans
Include llm.finish_reason on llm spans so you can tell why the model stopped generating. This is especially useful when debugging responses that were silently cut off because llm.finish_reason=length.
Here is a Pipecat example that enriches the built-in traced llm span:
from opentelemetry import trace as otel_trace
from pipecat.services.openai.llm import OpenAILLMService
def _read_path(value, *path):
current = value
for segment in path:
if current is None:
return None
if isinstance(segment, int):
if isinstance(current, (list, tuple)) and 0 <= segment < len(current):
current = current[segment]
else:
return None
continue
if isinstance(current, dict):
current = current.get(segment)
else:
current = getattr(current, segment, None)
return current
class _FinishReasonTrackingStream:
def __init__(self, stream):
self._stream = stream
self._iter = stream.__aiter__()
def __aiter__(self):
return self
async def __anext__(self):
chunk = await self._iter.__anext__()
finish_reason = _read_path(chunk, "choices", 0, "finish_reason")
if finish_reason is not None:
span = otel_trace.get_current_span()
if span.is_recording():
span.set_attribute("llm.finish_reason", str(finish_reason))
return chunk
async def aclose(self):
if hasattr(self._iter, "aclose"):
await self._iter.aclose()
elif hasattr(self._stream, "aclose"):
await self._stream.aclose()
async def close(self):
if hasattr(self._stream, "close"):
await self._stream.close()
else:
await self.aclose()
class CovalOpenAILLMService(OpenAILLMService):
"""Adds llm.finish_reason to Pipecat's built-in traced `llm` spans."""
async def get_chat_completions(self, params_from_context):
stream = await super().get_chat_completions(params_from_context)
return _FinishReasonTrackingStream(stream)
For non-Pipecat agents, set the attribute directly on your llm span after the provider response finishes:
with tracer.start_as_current_span("llm") as span:
response = client.responses.create(...)
if response.finish_reason:
span.set_attribute("llm.finish_reason", response.finish_reason)
Common values include stop, length, tool_calls, and content_filter.
Provider Fallback Spans
Many voice agents use a provider fallback chain for STT — for example, Deepgram → Google → Azure. Without per-provider spans, a single stt span only shows the final result; there is no visibility into which provider served the call, how long each attempt took, or why a fallback triggered.
The convention is to create one stt.provider.<name> child span per provider attempt, nested inside the parent stt span:
stt ← parent span: final result
└── stt.provider.deepgram ← attempt 1 (succeeded)
Or for a fallback:
stt ← parent span: final result
├── stt.provider.deepgram ← attempt 1 (failed, span status = ERROR)
└── stt.provider.google ← attempt 2 (succeeded)
Span attributes
| Attribute | Type | Description |
|---|
stt.providerName | string | Provider name, e.g. "deepgram", "google", "azure" |
stt.confidence | float | ASR confidence score from this provider (0.0–1.0) |
metrics.ttfb | float | Time to first byte for this provider attempt (seconds) |
Code example
import time
from opentelemetry import trace as otel_trace
tracer = otel_trace.get_tracer("my-stt-instrumentation")
def transcribe_with_fallback(audio):
providers = [("deepgram", deepgram_stt), ("google", google_stt)]
final_transcript = None
with tracer.start_as_current_span("stt") as stt_span:
for provider_name, stt_fn in providers:
attempt_start = time.time()
with tracer.start_as_current_span(f"stt.provider.{provider_name}") as provider_span:
provider_span.set_attribute("stt.providerName", provider_name)
try:
result = stt_fn(audio)
ttfb = time.time() - attempt_start
provider_span.set_attribute("metrics.ttfb", round(ttfb, 4))
confidence = getattr(result, "confidence", None)
if confidence is not None:
provider_span.set_attribute("stt.confidence", confidence)
final_transcript = result.transcript
break # success — stop trying fallbacks
except Exception as e:
provider_span.set_status(otel_trace.StatusCode.ERROR, str(e))
if final_transcript:
stt_span.set_attribute("transcript", final_transcript)
return final_transcript
Viewing Traces in Coval
After a simulation completes or monitoring traces are received, an OTel Traces card automatically appears in the metric grid on the result page when trace data is available. The card shows the total span count and a View Traces button that navigates directly to the trace viewer.
To view traces: open a run or monitoring result, click into a result, and click the OTel Traces card. You can also navigate directly via URL:
https://app.coval.dev/<org-slug>/runs/<run-id>/results/<simulation-output-id>/traces
Traces appear within a few seconds of the simulation completing or being submitted.
Trace viewer features
The trace viewer has two visualization modes you can switch between using the toggle in the header:
Waterfall view — Shows spans as horizontal bars on a timeline, nested by parent-child relationships. Use the collapse/expand controls to focus on specific parts of the call hierarchy. You can filter by span type using the color-coded legend pills in the header.
Flame graph view — Shows all spans stacked by depth, giving a birds-eye view of where time is spent. Interactions include:
- Scroll to pan the timeline left/right
- Ctrl/Cmd + scroll to zoom in and out
- Drag-select a region to zoom into that time range
- Double-click a span to zoom to fit that span’s duration
- Press F to reset the view to fit the full trace
- A mini-map above the flame graph shows the full trace with your current viewport highlighted — drag it to pan quickly
In both views, clicking any span opens a detail panel on the right showing the span’s attributes, timing, status, and parent chain. When no span is selected, the detail panel shows a trace summary with total spans, duration, span type breakdown with time percentages, slowest spans, and any error spans.
Transition Hotspots
Transition Hotspots give you a run-level view of how conversations flow through your agent’s states — and where they fail. Rather than inspecting individual simulations one by one, you can see the full distribution of state-to-state transitions across an entire run at a glance.
Walkthrough
Accessing Transition Hotspots
The Hotspots tab appears on the run results page when at least one simulation in the run has OTel trace data. Navigate to a run, then click the Hotspots tab. If the tab is not visible, the run does not contain any traced simulations.
You can also access it directly via the ?view=hotspots query parameter on the run results URL.
Reading the Heatmap
The Hotspots view displays a heatmap matrix where:
- Rows represent the origin state of a transition (the “from” state)
- Columns represent the destination state (the “to” state)
- Each cell represents a pair of states — for example, “greeting → account_lookup”
Toggle between two views using the buttons in the header:
| View | Description |
|---|
| Counts | Each cell shows how many times that state-to-state transition occurred across all simulations in the run |
| Failure Rate | Each cell shows the percentage of simulations that failed when hitting that transition |
Darker cells indicate higher counts or higher failure rates, depending on the active view.
Drilling Down
Click any cell in the heatmap to open a detail panel showing:
- The total count and failure count for that transition
- Exemplar simulations — individual simulations that passed through that state transition, with direct links to review them
Use exemplars to understand why a particular transition has a high failure rate: open a failing simulation and inspect the transcript and trace together.
The Top Hotspots sidebar ranks state transitions by failure count, making it easy to find the most impactful problems without scanning the full matrix. The top-ranked transitions are the ones where the most simulations failed.
Span Filters
Use the span type filters to include or exclude specific span types from the transition analysis. Wrapper spans — such as conversation, pipeline, transport, and session:* spans — are automatically collapsed and filtered by default, so the heatmap focuses on the meaningful transitions within your agent’s processing logic.
Start with the Failure Rate view to find which transitions are most problematic, then switch to Counts to understand the volume. A transition with a 100% failure rate but only 1 occurrence is less concerning than one with a 30% failure rate across 50 simulations.
Full Example
from opentelemetry.sdk import trace as trace_sdk
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace.export import SimpleSpanProcessor
from opentelemetry.sdk.resources import SERVICE_NAME, Resource
# Configure tracer
resource = Resource.create({SERVICE_NAME: "my-agent"})
provider = trace_sdk.TracerProvider(resource=resource)
exporter = OTLPSpanExporter(
endpoint="https://api.coval.dev/v1/traces",
headers={
"X-API-Key": "<COVAL_API_KEY>",
"X-Simulation-Id": "<simulation_id>",
},
timeout=30,
)
provider.add_span_processor(SimpleSpanProcessor(exporter))
tracer = provider.get_tracer("my-agent")
# Use tracer in agent code
with tracer.start_as_current_span("tool_call") as span:
span.set_attribute("tool.name", "search_database")
result = call_tool()
# Call on agent exit for clean resource teardown.
provider.shutdown()
Using Span Attributes in Custom Metrics
Any numeric span attribute your agent emits can be measured using a Custom Trace Metric (METRIC_CUSTOM_TRACE). This lets you track latency, token counts, or any other numeric value from your traces without writing custom evaluation code.
To create a custom trace metric, specify:
- Span Name — the
span_name of the spans to aggregate (e.g. llm, tts, or any custom span you create)
- Metric Attribute — the span attribute key containing the numeric value (e.g.
metrics.ttfb, token_count)
- Aggregation Method — how to aggregate across turns:
average, median, p90, max, or min
See Create Metric for the full API reference.