Skip to main content

Overview

Twilio ConversationRelay lets you connect a Twilio Programmable Voice call to a WebSocket server that handles STT → LLM → TTS in real time. This guide covers how to:
  1. Build an OTel span tree from ConversationRelay events and export it to Coval
  2. Correlate traces with Coval simulation runs despite Twilio PSTN stripping SIP headers
For a complete working implementation, see the coval-examples Twilio agent on GitHub.

The PSTN limitation

When Coval places a simulation call to your agent, it normally passes the simulation output ID as a custom SIP header:
X-Coval-Simulation-Id: <simulation-id>
This works for agents using SIP trunking (Telnyx, custom SBCs) where the SIP signaling layer is preserved end-to-end. Twilio Programmable Voice, however, routes calls through the public telephone network (PSTN). PSTN carriers strip non-standard SIP headers, so X-Coval-Simulation-Id never reaches your application.

Solution: pre_call_webhook_url

Coval supports an alternative correlation mechanism for agents where SIP headers are unavailable. Configure pre_call_webhook_url on your agent and Coval will POST the simulation output ID to your agent before dialing, giving it a chance to stash the ID before the call connects. The webhook is called once per simulation, immediately before the outbound call is placed. It receives:
{
  "simulation_output_id": "<sim-output-id>"
}
Your agent queues this ID, then pops it when the next call arrives.

Coval agent configuration

In the Coval dashboard, open your agent’s settings and set the following in the agent metadata:
{
  "pre_call_webhook_url": "https://your-agent.fly.dev/register-simulation",
  "pre_call_webhook_headers": {"x-api-key": "<your-agent-api-key>"}
}
FieldDescription
pre_call_webhook_urlThe URL Coval will POST to before each simulation call
pre_call_webhook_headersOptional headers to include — use this to authenticate Coval’s request to your agent
Use COVAL_API_KEY (your Coval API key) as the value for x-api-key and validate it in your /register-simulation handler. This prevents other callers from pre-registering IDs.

Agent implementation

/register-simulation endpoint

Add an endpoint that accepts Coval’s pre-call notification and queues the simulation ID:
import time
from collections import deque
from typing import Optional
from fastapi import FastAPI, Header, HTTPException, Request
from fastapi.responses import JSONResponse

app = FastAPI()
COVAL_API_KEY = os.environ.get("COVAL_API_KEY", "")

# FIFO queue of (simulation_output_id, registered_at) tuples
_pending_sim_ids: deque[tuple[str, float]] = deque()
_SIM_ID_TTL_SECONDS = 300  # expire after 5 minutes


def _pop_pending_sim_id() -> Optional[str]:
    """Return the oldest non-expired pending simulation ID, or None."""
    now = time.time()
    while _pending_sim_ids:
        sim_id, registered_at = _pending_sim_ids[0]
        if now - registered_at > _SIM_ID_TTL_SECONDS:
            _pending_sim_ids.popleft()  # expired, discard
        else:
            break
    if _pending_sim_ids:
        sim_id, _ = _pending_sim_ids.popleft()
        return sim_id
    return None


@app.post("/register-simulation")
async def register_simulation(
    request: Request,
    x_api_key: str = Header(default=""),
):
    if not COVAL_API_KEY or x_api_key != COVAL_API_KEY:
        raise HTTPException(status_code=401, detail="Invalid API key")

    body = await request.json()
    simulation_output_id = body.get("simulation_output_id", "")
    if not simulation_output_id:
        raise HTTPException(status_code=400, detail="simulation_output_id is required")

    _pending_sim_ids.append((simulation_output_id, time.time()))
    return JSONResponse({"status": "ok", "queued": len(_pending_sim_ids)})

Reading the simulation ID on call arrival

In your ConversationRelay WebSocket handler, pop the pending ID when the "setup" event arrives:
@app.websocket("/ws")
async def conversationrelay_websocket(websocket: WebSocket):
    await websocket.accept()
    simulation_id: Optional[str] = None

    async for raw_message in websocket.iter_text():
        event = json.loads(raw_message)
        event_type = event.get("type", "")

        if event_type in ("setup", "connected"):
            # Pop the pre-registered simulation ID for this call
            simulation_id = _pop_pending_sim_id()

        elif event_type == "prompt":
            voice_prompt = event.get("voicePrompt", "")
            # ... call LLM, stream response back to Twilio ...

Exporting traces after the call

When the WebSocket closes, build OTLP spans from your turn log and POST them to Coval:
import httpx

COVAL_TRACES_URL = "https://api.coval.dev/v1/traces"

def _send_spans(spans: list[dict], simulation_id: str) -> None:
    payload = {
        "resourceSpans": [
            {
                "resource": {
                    "attributes": [
                        {"key": "service.name", "value": {"stringValue": "twilio-voice-agent"}}
                    ]
                },
                "scopeSpans": [
                    {
                        "scope": {"name": "twilio-voice-agent"},
                        "spans": spans,
                    }
                ],
            }
        ]
    }
    httpx.post(
        COVAL_TRACES_URL,
        json=payload,
        headers={
            "x-api-key": COVAL_API_KEY,
            "X-Simulation-Id": simulation_id,
        },
        timeout=30,
    )
Call _send_spans in the finally block of your WebSocket handler, after the call ends:
    finally:
        if simulation_id and turns:
            call_duration_seconds = time.time() - call_start_epoch_seconds
            spans = _build_spans_from_turns(turns, call_start_epoch_seconds, call_duration_seconds)
            _send_spans(spans, simulation_id)

Trace limitations

ConversationRelay abstracts STT and TTS away from your application code entirely — you receive transcribed text in "prompt" events and send text tokens back; Twilio handles the rest. This means several span values cannot be measured and must be approximated. These are architectural constraints of the ConversationRelay model, not implementation choices.
The following trace values are synthetic when using Twilio ConversationRelay. Do not use them for latency analysis, benchmarking, or metric thresholds.
ValueWhy it must be synthetic
sttmetrics.ttfbTwilio performs speech recognition internally. Your application only receives the final transcribed text in a "prompt" WebSocket event — there is no timestamp for when speech started or when transcription completed.
sttstt.confidenceTwilio does not expose per-utterance ASR confidence scores through the ConversationRelay WebSocket API. Fixed at 0.95.
ttsmetrics.ttfbTwilio converts your text tokens to audio internally. Your application has no visibility into when audio playback begins at the caller’s end. Fixed at 0.1s.
The one value that is real: llmmetrics.ttfb. Because your application makes the LLM API call directly, you can measure wall-clock time from when the "prompt" event arrives to when the first response token is sent back. This is the only latency signal from ConversationRelay traces worth trusting. Practical implication: Coval’s built-in STT TTFB and TTS TTFB latency metrics will not reflect real performance for Twilio ConversationRelay agents. LLM TTFB metrics will. If you need real STT/TTS timing data, consider a framework where you control the STT and TTS API calls directly (e.g., Pipecat or LiveKit), which expose those timings to your instrumentation code.

Span schema

SpanKey attributesNotes
conversationcall.duration_secondsRoot span
sttstt.transcription, metrics.ttfb (synthetic), stt.confidence (synthetic 0.95)One per user turn
stt.provider.twiliostt.providerName, stt.confidence, metrics.ttfbChild of stt
llmmetrics.ttfb (real), llm.finish_reasonOne per assistant turn
ttsmetrics.ttfb (synthetic 0.1s)One per assistant turn
tool_calltool.name, tool.call_id, tool.argumentsWhen tools are invoked
tool_call_resulttool.name, tool.call_id, tool.resultStatus = ERROR if tool returned an error

Viewing traces

After a simulation completes, an OTel Traces card appears in the metric grid on the result page. Click View Traces to open the trace viewer. If no traces appear, check:
  1. pre_call_webhook_url is set on the Coval agent and points to the correct URL
  2. Your /register-simulation endpoint is publicly accessible and returning 200 OK
  3. The COVAL_API_KEY in the pre_call_webhook_headers matches what your agent expects
  4. COVAL_API_KEY is set in the agent environment (needed to export spans)

Full example

See the complete working implementation in coval-examples/voice-agents/twilio, which includes:
  • ConversationRelay WebSocket handler with interrupt support
  • Agentic LLM loop (tool calls → re-enter loop until finish_reason = stop)
  • Full span builder with real LLM TTFB measurement
  • Fly.io deployment configuration