Async audio attach

Overview

Most telephony stacks finalize the recording URL after the call has ended — typically 30 to 90 seconds later, well past the point where many agent containers have been recycled or scaled in. That makes it unreliable to submit transcript and audio together in a single, atomic post-call call. The solution is a two-call pattern. At call end, you POST /v1/conversations:submit with the transcript and metadata. You receive a conversation_id back synchronously. Once your telephony platform tells you the recording URL is finalized, you PATCH /v1/conversations/{conversation_id} to attach it. The benefit is that text-only metrics — anything derived from the transcript — start running the moment the conversation is submitted. Audio-derived metrics run as a second wave, once the recording is attached.

The two-call flow

Submit transcript at call end

POST /v1/conversations:submit accepts the transcript and returns a conversation_id synchronously. Text-only metrics begin running immediately. The conversation enters IN_PROGRESS status.

curl -X POST https://api.coval.dev/v1/conversations:submit \
  -H "x-api-key: $COVAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "transcript": [
      {"role": "agent", "content": "Hello, how can I help you today?"},
      {"role": "user", "content": "I need to check my order status."}
    ],
    "agent_id": "your-agent-id",
    "external_conversation_id": "CA1234567890abcdef",
    "occurred_at": "2026-05-13T18:42:00Z",
    "metadata": {
      "channel": "twilio-pstn",
      "from_number": "+14155550100"
    }
  }'

Response:

{
  "conversation_id": "conv_01HXYZ...",
  "status": "IN_PROGRESS",
  "has_audio": false
}

Store the returned conversation_id. You will need it for the PATCH call.

Attach the recording URL when ready

Once the recording URL finalizes (your telephony stack will deliver this via a webhook or callback), PATCH /v1/conversations/{conversation_id} with audio_url. Audio-derived metrics now run as a second wave.

curl -X PATCH "https://api.coval.dev/v1/conversations/$CONVERSATION_ID" \
  -H "x-api-key: $COVAL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "audio_url": "https://api.twilio.com/2010-04-01/Accounts/AC.../Recordings/RE....wav"
  }'

Response:

{
  "conversation_id": "conv_01HXYZ...",
  "status": "IN_PROGRESS",
  "has_audio": true
}

See the full PATCH conversation API reference for all supported fields. You can also attach inline base64 audio via the audio field rather than a remote URL.

Show full submit request body

{
  "transcript": [
    {"role": "agent", "content": "Hello, how can I help you today?"},
    {"role": "user", "content": "I need to check my order status."},
    {"role": "agent", "content": "I can help with that. Can I have your order number?"},
    {"role": "user", "content": "It is 12345."}
  ],
  "agent_id": "your-agent-id",
  "external_conversation_id": "CA1234567890abcdef",
  "occurred_at": "2026-05-13T18:42:00Z",
  "metadata": {
    "channel": "twilio-pstn",
    "from_number": "+14155550100",
    "to_number": "+18005551212",
    "tenant_id": "acme-corp"
  }
}

Webhook timing — two waves

Configure your webhook consumer to expect two waves.

Wave 1 (text-only metrics) — fires within seconds of the POST /v1/conversations:submit returning. Carries everything derived from the transcript.
Wave 2 (audio-derived metrics) — fires only after the PATCH /v1/conversations/{conversation_id} lands and the recording is processed. Carries metrics like STT WER, audio sentiment, and diarization-dependent scores.

Dedupe by external_conversation_id if you store webhook payloads downstream — the same conversation will be referenced by both waves.

Idempotency

Audio can be attached once per conversation. The PATCH call is single-shot.

A second PATCH to the same conversation_id returns 409 ALREADY_EXISTS.
A conversation submitted with audio inline via POST /v1/conversations:submit (passing audio_url or audio in the initial body) already has audio attached and cannot be PATCHed.
To re-run with a corrected recording, submit a new conversation with a new external_conversation_id.

If your recording-status webhook fires more than once for the same call — some providers retry on transient receiver errors — your handler must be idempotent against the 409. Treat the first 200 as success and ignore subsequent 409s.

When you don’t need this

Skip the PATCH step if either of the following applies:

You control the media path. If your stack hands you the finalized recording URL or bytes at the moment the call ends — for example, you run your own media server and write the file before the agent disconnects — just call POST /v1/conversations:submit with audio_url (or audio) in the same payload. There is nothing to PATCH later.
You only need transcript-derived metrics. Submit the transcript with POST /v1/conversations:submit and stop there. The conversation will be evaluated against any metric that does not require audio, and has_audio will remain false.

Introduction

Configuration

Observability

Step-by-Step Guides

Use Cases

Connect & Collaborate

Overview

The two-call flow

Webhook timing — two waves

Idempotency

When you don’t need this

See also

Introduction

Configuration

Observability

Step-by-Step Guides

Use Cases

Connect & Collaborate

Documentation Index

​Overview

​The two-call flow

​Webhook timing — two waves

​Idempotency

​When you don’t need this

​See also

Overview

The two-call flow

Webhook timing — two waves

Idempotency

When you don’t need this

See also