Skip to main content

Overview

WebSocket connections enable real-time, bidirectional audio communication with your AI agents. This connection type is ideal for voice-based agents, real-time assistants, or any application that requires persistent, low-latency audio exchange. When you connect a WebSocket agent, Coval establishes a secure connection to your endpoint and handles the full audio conversation flow—sending audio chunks, receiving responses, and evaluating performance.
For text-only chat agents, see Chat WebSocket instead.

Connection Modes

WebSocket supports two connection modes:

Direct Mode (Default)

Connect directly to a WebSocket endpoint.
FieldRequiredDescription
WebSocket EndpointYesThe wss:// URL to connect to
Initialization JSONNoJSON payload sent immediately after connection
Authorization HeaderNoAuth value sent during the WebSocket handshake
Custom HeadersNoAdditional headers for the WebSocket handshake

HTTP-First Mode

Call an HTTP endpoint first to create a session, then connect to the WebSocket URL returned in the response. Common with platforms that require session provisioning before establishing a WebSocket connection.
FieldRequiredDescription
HTTP Endpoint URLYesThe https:// URL to call for session setup
HTTP MethodNoRequest method (default: POST)
Request BodyNoJSON body for the HTTP request
HTTP HeadersNoHeaders for the HTTP request
WebSocket URL Response PathYesDot-notation path to the WebSocket URL in the response
Authorization HeaderNoAuth value for the WebSocket connection (separate from HTTP headers)
Initialization JSONNoJSON payload sent after WebSocket connection
Custom HeadersNoAdditional headers for the WebSocket connection
Flow:
  1. Coval sends an HTTP request to your setup endpoint
  2. Your API returns a response containing the WebSocket URL
  3. Coval connects to that WebSocket URL

Configuration Requirements

WebSocket Endpoint

  • Field: endpoint
  • Type: String (required)
  • Purpose: The WebSocket URL that Coval connects to for simulations
  • Format: Must start with wss:// (secure WebSocket)
  • Example: wss://your-api.com/ws/voice
Only secure WebSocket connections (wss://) are supported. Plain ws:// endpoints will be rejected for security reasons.

Initialization JSON

  • Field: initialization_json
  • Type: JSON object (optional)
  • Purpose: Initial payload sent to your WebSocket endpoint when the connection is established
  • Format: Valid JSON object
  • Use Cases: Session initialization, authentication handshakes, context setup
Example:
{
  "action": "start_session",
  "session_type": "simulation",
  "metadata": {
    "source": "coval",
    "test_mode": true
  }
}
The initialization JSON is sent immediately after the WebSocket connection is established. Use this to configure your agent’s behavior or authenticate the session.

Authorization Header

  • Field: authorization_header
  • Type: String (optional)
  • Purpose: Authentication value sent in the Authorization header during the WebSocket handshake
  • Format: Standard authorization header value
  • Security: Stored encrypted and handled securely
Common formats:
  • Bearer token: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
  • API key: X-API-Key your-api-key-here
  • Basic auth: Basic base64-encoded-credentials

Custom Headers

  • Field: custom_headers
  • Type: JSON object (optional)
  • Purpose: Additional HTTP headers sent during the WebSocket handshake
  • Format: Valid JSON object with string key-value pairs
  • Use Cases: Custom authentication, routing headers, client identification
Example:
{
  "X-Client-ID": "coval-simulation",
  "X-API-Version": "2024-01",
  "X-Environment": "production"
}

Media Send Template

  • Field: send_media_template
  • Type: String (optional)
  • Purpose: Controls how Coval sends an attached image to your agent during WebSocket voice simulations
  • Applies When: A test case includes an image attachment and the persona decides to send it
Default template:
{
  "type": "media",
  "name": "{{media_name}}",
  "mime_type": "{{mime_type}}",
  "data": "{{media_data}}"
}
Template rules:
  • {{media_data}} is required.
  • {{media_name}} and {{mime_type}} are optional placeholders.
  • If the template is exactly {{media_data}}, Coval sends raw bytes.
  • Otherwise, Coval base64-encodes the image and substitutes it into your JSON template.
This setting is only used for WebSocket voice simulations with test cases that include image attachments. See Test Sets for the test-case setup.

Advanced Configuration

Advanced settings provide fine-grained control over audio sample rates, handshake behavior, and media format.

Sample Rates

Sample rates determine how audio is processed for transmission to and from your agent. Coval handles internal resampling to ensure audio quality, but you can configure the rates for communication with your agent.
SettingDefaultAllowed ValuesDescription
Send Sample Rate16,000 Hz8,000 / 16,000 / 24,000 / 48,000 HzAudio sample rate Coval sends to your agent
Receive Sample Rate48,000 Hz8,000 / 16,000 / 24,000 / 48,000 HzAudio sample rate Coval expects from your agent
The default receive rate (48,000 Hz) is higher than the send rate (16,000 Hz) because many voice integrations receive higher-rate audio from the agent while sending 16 kHz audio back.
Mismatched sample rates relative to your agent’s actual stream format can cause speed, pitch, or quality issues. Ensure your agent can handle the configured sample rates.

Handshake

Handshake settings control how Coval waits for your agent to be ready to receive audio.
SettingDirect Mode DefaultHTTP-First Mode DefaultDescription
Ready message typesession_readyDisabled by defaultMessage type value that signals your agent is ready
Require session IDEnabledDisabledWhether to expect a session ID in the ready message
Handshake timeout30 seconds30 secondsHow long to wait before timing out
Example ready message:
{
  "type": "session_ready",
  "session_id": "abc123"
}

Audio Format

Audio format settings control how audio data is structured when sent to your agent.
SettingDefaultDescription
Send audio template{"type": "audio_chunk", "data": "{{audio_data}}"}JSON template for audio chunks
Message type pathtypePath to the message type field
Audio message type valueaudio_chunkValue that identifies an audio chunk
Audio data pathdataPath to the actual audio data
Audio encodingpcmAudio encoding format (pcm or mp3)
Paths can use dot notation for nested fields, for example: payload.audio.data. These advanced mapping options allow you to customize the audio payload structure to match your agent’s expected format.

Media Format

Media format settings control how image attachments are sent to your agent.
SettingDefaultDescription
Media send templateSee aboveJSON template for image attachments

Setup Instructions

  1. Prepare your WebSocket endpoint
    • Ensure your endpoint accepts secure WebSocket connections (wss://)
    • Configure your server to handle the authorization header if authentication is required
    • Set up message handling for the conversation flow
    • If you want to receive test-case images, decide whether your endpoint expects raw bytes or a JSON payload with base64 media data
    • Configure audio handling for the expected sample rates
  2. Create the agent in Coval
    • Navigate to app.coval.dev/agents/create
    • Select WebSocket as the connection type
    • For Direct Mode: enter your WebSocket endpoint URL (must start with wss://)
    • For HTTP-First Mode: enter your HTTP setup URL (https://), HTTP method/body/headers, and the WebSocket URL response path
    • Add authorization header if your endpoint requires authentication
    • Configure initialization JSON if you need to send a setup payload
    • Add any custom headers required by your endpoint
    • Set Media Send Template if your agent needs a specific image payload shape
    • Configure advanced settings like sample rates if needed
  3. Test the connection
    • Create a simple test set with a few scenarios
    • Add an image attachment to one test case if you want to test multimodal behavior
    • Launch a simulation to verify end-to-end connectivity
    • Check that audio is being sent and received correctly

How Simulations Work

When you launch a simulation with a WebSocket agent, Coval:
  1. HTTP-first setup (if applicable) — If using HTTP-First mode, calls the HTTP endpoint first to get the WebSocket URL
  2. Establishes the connection — Opens a secure WebSocket connection to your endpoint (either the direct endpoint or the URL returned from HTTP setup)
  3. Sends initialization payload — If configured, sends the initialization JSON immediately after connection
  4. Waits for handshake (if configured) — Waits for a ready message from the agent if a ready message type is configured
  5. Exchanges audio — Sends audio chunks and receives audio responses according to the persona behavior, including attached test-case images when relevant
  6. Records the transcript — Captures the full conversation for evaluation
  7. Runs metrics — Evaluates the conversation against your configured metrics
  8. Closes the connection — Cleanly terminates the WebSocket connection

Message Format

Coval sends and expects audio messages in JSON format or raw binary, with the following behavior:

Outbound Audio (Coval → Your Agent)

  • By default, Coval sends audio as JSON like {"type": "audio_chunk", "data": "..."}.
  • You can customize that payload under Send Audio Template.
  • If the template is just {{audio_data}}, Coval sends raw bytes.
  • Audio data is base64-encoded when using JSON templates.

Inbound Audio (Your Agent → Coval)

  • By default, Coval looks for incoming JSON audio messages where type is audio_chunk and the audio bytes are in data.
  • Incoming audio can also be raw binary PCM when your integration sends binary frames.
  • audio_encoding controls how incoming audio is decoded: pcm by default, or mp3.

Technical Requirements

Running Locally

If your WebSocket server is running locally, you’ll need to expose it publicly for Coval to connect. Use a tunneling service like ngrok:
ngrok http 8080
# Use the generated wss:// URL as your endpoint
# Note: if ngrok gives you an https:// tunnel URL, use the corresponding wss:// URL in Coval
Remember to update your agent configuration when your tunnel URL changes. Consider using ngrok’s reserved domains for persistent URLs.

Troubleshooting

Connection Failures

“Invalid WebSocket URL” error
  • Verify your endpoint starts with wss:// (not ws://, http://, or https://)
  • Check that the URL is properly formatted with no trailing spaces
“Connection refused” error
  • Ensure your WebSocket server is running and accessible
  • Check firewall rules allow inbound connections on the WebSocket port
  • Verify the endpoint URL is correct
“Authentication failed” error
  • Confirm your authorization header value is correct
  • Check that the header format matches what your server expects
  • Verify the API key or token hasn’t expired

Audio Quality Issues

“Audio quality degraded” or “Playback issues”
  • Check your Send Sample Rate and Receive Sample Rate settings match what your agent expects
  • Verify your agent can handle the configured sample rates (8,000 / 16,000 / 24,000 / 48,000 Hz)
  • Ensure audio encoding (pcm vs mp3) is supported by your agent
Sample rate mismatch
  • If your agent expects 48,000 Hz but you set Send Sample Rate to 16,000 Hz, audio quality may be affected
  • If your agent sends audio at 8,000 Hz but you set Receive Sample Rate to 48,000 Hz, audio quality may be affected

Message Handling Issues

“No response received” error
  • Ensure your agent sends audio responses in the configured JSON format with type and data fields
  • Check that your agent is processing and responding to incoming audio messages
  • Verify there are no errors in your agent’s logs
“Invalid JSON in response” error
  • Confirm your agent returns valid JSON for audio messages
  • Check for proper encoding of special characters
  • Ensure the type and data fields are properly formatted
“Agent receives unreadable audio payload”
  • Verify your audio template matches your agent’s expected format
  • Ensure audio encoding (pcm vs mp3) is supported by your agent
  • Check that audio data is properly base64-encoded when using JSON templates
  • For raw PCM frames, verify the format matches what Coval expects
“Agent receives unreadable media payload”
  • If your agent expects JSON, use a JSON send_media_template
  • If your agent expects raw bytes, set send_media_template to exactly {{media_data}}
  • Remember that {{media_data}} is base64-encoded when used inside a JSON template
  • Include {{mime_type}} and {{media_name}} if your agent needs file metadata

Timeout Issues

Simulation timeouts
  • Verify your agent responds within a reasonable time
  • Check for any blocking operations in your agent’s message handler
  • Monitor your agent’s resource usage during simulations

Best Practices

  1. Use persistent connections — Voice WebSocket agents should maintain the connection throughout the conversation
  2. Handle reconnection gracefully — If your agent supports it, configure automatic reconnection
  3. Log initialization payloads — Track the initialization JSON in your server logs for debugging
  4. Implement health checks — Add a ping/pong mechanism to detect connection issues early
  5. Secure your endpoint — Always use wss:// and implement proper authentication
  6. Match sample rates — Set Send and Receive sample rates to match your agent’s capabilities