Skip to main content

Overview

Twilio ConversationRelay lets you connect a Twilio Programmable Voice call to a WebSocket server that handles STT → LLM → TTS in real time. This guide covers how to:
  1. Build an OTel span tree from ConversationRelay events and export it to Coval
  2. Correlate traces with Coval simulation runs despite Twilio PSTN stripping SIP headers
For a complete working implementation, see the coval-examples Twilio agent on GitHub.

Walkthrough

The PSTN limitation

When Coval places a simulation call to your agent, it normally passes the simulation output ID as a custom SIP header:
X-Coval-Simulation-Id: <simulation-id>
This works for agents using SIP trunking (Telnyx, custom SBCs) where the SIP signaling layer is preserved end-to-end. Twilio Programmable Voice, however, routes calls through the public telephone network (PSTN). PSTN carriers strip non-standard SIP headers, so X-Coval-Simulation-Id never reaches your application.

Solution: pre_call_webhook_url

Coval supports an alternative correlation mechanism for agents where SIP headers are unavailable. Configure pre_call_webhook_url on your agent and Coval will POST the simulation output ID to your agent before dialing, giving it a chance to stash the ID before the call connects. The webhook is called once per simulation, immediately before the outbound call is placed. It receives:
{
  "simulation_output_id": "<sim-output-id>",
  "from_number": "+16504471573",
  "to_number": "+15105077509"
}
from_number is the caller ID Coval will dial from. to_number is your agent’s phone number. Use these to correlate the webhook with the incoming call — especially useful when running multiple agent replicas behind a load balancer. Your agent queues this ID, then pops it when the next call arrives.

Configure your Twilio phone number

Before Coval can place simulation calls to your agent, your Twilio phone number must be configured to route inbound calls to your agent’s webhook.
  1. Open the Twilio Console and navigate to Phone Numbers → Manage → Active Numbers.
  2. Click the phone number you want to use for simulations.
  3. Scroll to the Voice Configuration section.
  4. Set Configure with to Webhook, TwiML Bin, Function, Studio Flow, Proxy Service.
  5. Under A call comes in, select Webhook and enter your agent’s webhook URL (e.g. https://your-agent.fly.dev/webhook). Leave HTTP method as HTTP POST.
  6. Save the configuration.
The /webhook endpoint is the entry point for all inbound calls. When Twilio receives a call on your number, it POSTs to this URL and expects TwiML in response — typically a <Connect><ConversationRelay> instruction pointing at your WebSocket handler.

Coval agent configuration

In the Coval dashboard, open your agent’s settings and set the following in the agent metadata:
{
  "pre_call_webhook_url": "https://your-agent.fly.dev/register-simulation",
  "pre_call_webhook_headers": {"x-api-key": "<your-agent-api-key>"}
}
FieldDescription
pre_call_webhook_urlThe URL Coval will POST to before each simulation call
pre_call_webhook_headersOptional headers to include — use this to authenticate Coval’s request to your agent
Use COVAL_API_KEY (your Coval API key) as the value for x-api-key and validate it in your /register-simulation handler. This prevents other callers from pre-registering IDs.

Agent implementation

/register-simulation endpoint

Add an endpoint that accepts Coval’s pre-call notification and queues the simulation ID:
import time
from collections import deque
from typing import Optional
from fastapi import FastAPI, Header, HTTPException, Request
from fastapi.responses import JSONResponse

app = FastAPI()
COVAL_API_KEY = os.environ.get("COVAL_API_KEY", "")

# Queue of (simulation_output_id, from_number, registered_at) tuples
_pending_sim_ids: deque[tuple[str, str, float]] = deque()
_SIM_ID_TTL_SECONDS = 300  # expire after 5 minutes


def _pop_pending_sim_id(caller_id: str = "") -> Optional[str]:
    """Return the simulation ID matching the caller, or the oldest non-expired one."""
    now = time.time()
    # Purge expired entries
    while _pending_sim_ids and now - _pending_sim_ids[0][2] > _SIM_ID_TTL_SECONDS:
        _pending_sim_ids.popleft()

    # Try to match by caller ID first
    for i, (sim_id, from_number, _) in enumerate(_pending_sim_ids):
        if from_number and caller_id and from_number == caller_id:
            del _pending_sim_ids[i]
            return sim_id

    # Fall back to FIFO if no caller ID match
    if _pending_sim_ids:
        sim_id, _, _ = _pending_sim_ids.popleft()
        return sim_id
    return None


@app.post("/register-simulation")
async def register_simulation(
    request: Request,
    x_api_key: str = Header(default=""),
):
    if not COVAL_API_KEY or x_api_key != COVAL_API_KEY:
        raise HTTPException(status_code=401, detail="Invalid API key")

    body = await request.json()
    simulation_output_id = body.get("simulation_output_id", "")
    if not simulation_output_id:
        raise HTTPException(status_code=400, detail="simulation_output_id is required")

    from_number = body.get("from_number", "")
    _pending_sim_ids.append((simulation_output_id, from_number, time.time()))
    return JSONResponse({"status": "ok", "queued": len(_pending_sim_ids)})

Reading the simulation ID on call arrival

In your ConversationRelay WebSocket handler, pop the pending ID when the "setup" event arrives:
@app.websocket("/ws")
async def conversationrelay_websocket(websocket: WebSocket):
    await websocket.accept()
    simulation_id: Optional[str] = None

    async for raw_message in websocket.iter_text():
        event = json.loads(raw_message)
        event_type = event.get("type", "")

        if event_type in ("setup", "connected"):
            # Match by caller ID from the webhook registration
            caller_id = event.get("from", "")
            simulation_id = _pop_pending_sim_id(caller_id)

        elif event_type == "prompt":
            voice_prompt = event.get("voicePrompt", "")
            # ... call LLM, stream response back to Twilio ...
For multi-replica deployments (Kubernetes, auto-scaling groups), store the pending simulation IDs in a shared datastore (e.g. Redis, your application database) instead of an in-memory queue. The from_number field lets any replica match the incoming call to the correct simulation, regardless of which replica received the webhook.

Exporting traces after the call

When the WebSocket closes, build OTLP spans from your turn log and POST them to Coval:
import httpx

COVAL_TRACES_URL = "https://api.coval.dev/v1/traces"

def _send_spans(spans: list[dict], simulation_id: str) -> None:
    payload = {
        "resourceSpans": [
            {
                "resource": {
                    "attributes": [
                        {"key": "service.name", "value": {"stringValue": "twilio-voice-agent"}}
                    ]
                },
                "scopeSpans": [
                    {
                        "scope": {"name": "twilio-voice-agent"},
                        "spans": spans,
                    }
                ],
            }
        ]
    }
    httpx.post(
        COVAL_TRACES_URL,
        json=payload,
        headers={
            "x-api-key": COVAL_API_KEY,
            "X-Simulation-Id": simulation_id,
        },
        timeout=30,
    )
Call _send_spans in the finally block of your WebSocket handler, after the call ends:
    finally:
        if simulation_id and turns:
            call_duration_seconds = time.time() - call_start_epoch_seconds
            spans = _build_spans_from_turns(turns, call_start_epoch_seconds, call_duration_seconds)
            _send_spans(spans, simulation_id)

Async audio attachment (multi-replica deployments)

Twilio Programmable Voice recording URLs are not retrievable at call end — Twilio finalizes the recording asynchronously, typically ~60 seconds later. If your agent runs as a single long-lived process you can simply wait, but in a multi-replica Kubernetes or Fly.io deployment the agent container may be terminated before the URL becomes available. For those deployments, split conversation submission across two API calls:
  1. At call end, POST /v1/conversations:submit with the transcript only. You immediately get a conversation_id and text-only metrics start running. Flush your buffered OTel spans with this conversation_id so traces correlate with the conversation.
  2. When the recording URL is ready, PATCH /v1/conversations/{conversation_id} with the audio_url. Audio-dependent metrics then run as a second wave.
Each wave delivers a separate webhook. Configure your consumer to expect both.
# Step 1: submit transcript at call end (returns conversation_id immediately)
curl -X POST https://api.coval.dev/v1/conversations:submit \
  -H 'x-api-key: $COVAL_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "transcript": [...],
    "external_conversation_id": "CAxxxxx"
  }'

# Step 2: attach audio once the Twilio recording URL is ready
curl -X PATCH https://api.coval.dev/v1/conversations/${CONVERSATION_ID} \
  -H 'x-api-key: $COVAL_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{"audio_url": "https://api.twilio.com/.../Recordings/RExxxxx.wav"}'
Audio can only be attached once per conversation — a second PATCH /v1/conversations/{conversation_id} returns 409 ALREADY_EXISTS.

Trace limitations

ConversationRelay abstracts STT and TTS away from your application code entirely — you receive transcribed text in "prompt" events and send text tokens back; Twilio handles the rest. This means several span values cannot be measured and must be approximated. These are architectural constraints of the ConversationRelay model, not implementation choices.
The following trace values are synthetic when using Twilio ConversationRelay. Do not use them for latency analysis, benchmarking, or metric thresholds.
ValueWhy it must be synthetic
sttmetrics.ttfbTwilio performs speech recognition internally. Your application only receives the final transcribed text in a "prompt" WebSocket event — there is no timestamp for when speech started or when transcription completed.
sttstt.confidenceTwilio does not expose per-utterance ASR confidence scores through the ConversationRelay WebSocket API. Fixed at 0.95.
ttsmetrics.ttfbTwilio converts your text tokens to audio internally. Your application has no visibility into when audio playback begins at the caller’s end. Fixed at 0.1s.
The one value that is real: llmmetrics.ttfb. Because your application makes the LLM API call directly, you can measure wall-clock time from when the "prompt" event arrives to when the first response token is sent back. This is the only latency signal from ConversationRelay traces worth trusting. Practical implication: Coval’s built-in STT TTFB and TTS TTFB latency metrics will not reflect real performance for Twilio ConversationRelay agents. LLM TTFB metrics will. If you need real STT/TTS timing data, consider a framework where you control the STT and TTS API calls directly (e.g., Pipecat or LiveKit), which expose those timings to your instrumentation code.

Span schema

SpanKey attributesNotes
conversationcall.duration_secondsRoot span
sttstt.transcription, metrics.ttfb (synthetic), stt.confidence (synthetic 0.95)One per user turn
stt.provider.twiliostt.providerName, stt.confidence, metrics.ttfbChild of stt
llmmetrics.ttfb (real), llm.finish_reasonOne per assistant turn
ttsmetrics.ttfb (synthetic 0.1s)One per assistant turn
tool_calltool.name, tool.call_id, tool.argumentsWhen tools are invoked
tool_call_resulttool.name, tool.call_id, tool.resultStatus = ERROR if tool returned an error

Viewing traces

After a simulation completes, an OTel Traces card appears in the metric grid on the result page. Click View Traces to open the trace viewer. If no traces appear, check:
  1. pre_call_webhook_url is set on the Coval agent and points to the correct URL
  2. Your /register-simulation endpoint is publicly accessible and returning 200 OK
  3. The COVAL_API_KEY in the pre_call_webhook_headers matches what your agent expects
  4. COVAL_API_KEY is set in the agent environment (needed to export spans)

Full example

See the complete working implementation in coval-examples/voice-agents/twilio, which includes:
  • ConversationRelay WebSocket handler with interrupt support
  • Agentic LLM loop (tool calls → re-enter loop until finish_reason = stop)
  • Full span builder with real LLM TTFB measurement
  • Fly.io deployment configuration