Overview
WebSocket connections enable real-time, bidirectional audio communication with your AI agents. This connection type is ideal for voice-based agents, real-time assistants, or any application that requires persistent, low-latency audio exchange.
When you connect a WebSocket agent, Coval establishes a secure connection to your endpoint and handles the full audio conversation flow—sending audio chunks, receiving responses, and evaluating performance.
Connection Modes
WebSocket supports two connection modes:
Direct Mode (Default)
Connect directly to a WebSocket endpoint.
| Field | Required | Description |
|---|
| WebSocket Endpoint | Yes | The wss:// URL to connect to |
| Initialization JSON | No | JSON payload sent immediately after connection |
| Authorization Header | No | Auth value sent during the WebSocket handshake |
| Custom Headers | No | Additional headers for the WebSocket handshake |
HTTP-First Mode
Call an HTTP endpoint first to create a session, then connect to the WebSocket URL returned in the response. Common with platforms that require session provisioning before establishing a WebSocket connection.
| Field | Required | Description |
|---|
| HTTP Endpoint URL | Yes | The https:// URL to call for session setup |
| HTTP Method | No | Request method (default: POST) |
| Request Body | No | JSON body for the HTTP request |
| HTTP Headers | No | Headers for the HTTP request |
| WebSocket URL Response Path | Yes | Dot-notation path to the WebSocket URL in the response |
| Authorization Header | No | Auth value for the WebSocket connection (separate from HTTP headers) |
| Initialization JSON | No | JSON payload sent after WebSocket connection |
| Custom Headers | No | Additional headers for the WebSocket connection |
Flow:
- Coval sends an HTTP request to your setup endpoint
- Your API returns a response containing the WebSocket URL
- Coval connects to that WebSocket URL
Configuration Requirements
WebSocket Endpoint
- Field:
endpoint
- Type: String (required)
- Purpose: The WebSocket URL that Coval connects to for simulations
- Format: Must start with
wss:// (secure WebSocket)
- Example:
wss://your-api.com/ws/voice
Only secure WebSocket connections (wss://) are supported. Plain ws:// endpoints will be rejected for security reasons.
Initialization JSON
- Field:
initialization_json
- Type: JSON object (optional)
- Purpose: Initial payload sent to your WebSocket endpoint when the connection is established
- Format: Valid JSON object
- Use Cases: Session initialization, authentication handshakes, context setup
Example:
{
"action": "start_session",
"session_type": "simulation",
"metadata": {
"source": "coval",
"test_mode": true
}
}
The initialization JSON is sent immediately after the WebSocket connection is established. Use this to configure your agent’s behavior or authenticate the session.
- Field:
authorization_header
- Type: String (optional)
- Purpose: Authentication value sent in the
Authorization header during the WebSocket handshake
- Format: Standard authorization header value
- Security: Stored encrypted and handled securely
Common formats:
- Bearer token:
Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...
- API key:
X-API-Key your-api-key-here
- Basic auth:
Basic base64-encoded-credentials
- Field:
custom_headers
- Type: JSON object (optional)
- Purpose: Additional HTTP headers sent during the WebSocket handshake
- Format: Valid JSON object with string key-value pairs
- Use Cases: Custom authentication, routing headers, client identification
Example:
{
"X-Client-ID": "coval-simulation",
"X-API-Version": "2024-01",
"X-Environment": "production"
}
- Field:
send_media_template
- Type: String (optional)
- Purpose: Controls how Coval sends an attached image to your agent during WebSocket voice simulations
- Applies When: A test case includes an image attachment and the persona decides to send it
Default template:
{
"type": "media",
"name": "{{media_name}}",
"mime_type": "{{mime_type}}",
"data": "{{media_data}}"
}
Template rules:
{{media_data}} is required.
{{media_name}} and {{mime_type}} are optional placeholders.
- If the template is exactly
{{media_data}}, Coval sends raw bytes.
- Otherwise, Coval base64-encodes the image and substitutes it into your JSON template.
This setting is only used for WebSocket voice simulations with test cases that include image attachments. See Test Sets for the test-case setup.
Advanced Configuration
Advanced settings provide fine-grained control over audio sample rates, handshake behavior, and media format.
Sample Rates
Sample rates determine how audio is processed for transmission to and from your agent. Coval handles internal resampling to ensure audio quality, but you can configure the rates for communication with your agent.
| Setting | Default | Allowed Values | Description |
|---|
| Send Sample Rate | 16,000 Hz | 8,000 / 16,000 / 24,000 / 48,000 Hz | Audio sample rate Coval sends to your agent |
| Receive Sample Rate | 48,000 Hz | 8,000 / 16,000 / 24,000 / 48,000 Hz | Audio sample rate Coval expects from your agent |
The default receive rate (48,000 Hz) is higher than the send rate (16,000 Hz) because many voice integrations receive higher-rate audio from the agent while sending 16 kHz audio back.
Mismatched sample rates relative to your agent’s actual stream format can cause speed, pitch, or quality issues. Ensure your agent can handle the configured sample rates.
Handshake
Handshake settings control how Coval waits for your agent to be ready to receive audio.
| Setting | Direct Mode Default | HTTP-First Mode Default | Description |
|---|
| Ready message type | session_ready | Disabled by default | Message type value that signals your agent is ready |
| Require session ID | Enabled | Disabled | Whether to expect a session ID in the ready message |
| Handshake timeout | 30 seconds | 30 seconds | How long to wait before timing out |
Example ready message:
{
"type": "session_ready",
"session_id": "abc123"
}
Audio format settings control how audio data is structured when sent to your agent.
| Setting | Default | Description |
|---|
| Send audio template | {"type": "audio_chunk", "data": "{{audio_data}}"} | JSON template for audio chunks |
| Message type path | type | Path to the message type field |
| Audio message type value | audio_chunk | Value that identifies an audio chunk |
| Audio data path | data | Path to the actual audio data |
| Audio encoding | pcm | Audio encoding format (pcm or mp3) |
Paths can use dot notation for nested fields, for example: payload.audio.data.
These advanced mapping options allow you to customize the audio payload structure to match your agent’s expected format.
Media format settings control how image attachments are sent to your agent.
| Setting | Default | Description |
|---|
| Media send template | See above | JSON template for image attachments |
Setup Instructions
-
Prepare your WebSocket endpoint
- Ensure your endpoint accepts secure WebSocket connections (
wss://)
- Configure your server to handle the authorization header if authentication is required
- Set up message handling for the conversation flow
- If you want to receive test-case images, decide whether your endpoint expects raw bytes or a JSON payload with base64 media data
- Configure audio handling for the expected sample rates
-
Create the agent in Coval
- Navigate to app.coval.dev/agents/create
- Select WebSocket as the connection type
- For Direct Mode: enter your WebSocket endpoint URL (must start with
wss://)
- For HTTP-First Mode: enter your HTTP setup URL (
https://), HTTP method/body/headers, and the WebSocket URL response path
- Add authorization header if your endpoint requires authentication
- Configure initialization JSON if you need to send a setup payload
- Add any custom headers required by your endpoint
- Set Media Send Template if your agent needs a specific image payload shape
- Configure advanced settings like sample rates if needed
-
Test the connection
- Create a simple test set with a few scenarios
- Add an image attachment to one test case if you want to test multimodal behavior
- Launch a simulation to verify end-to-end connectivity
- Check that audio is being sent and received correctly
How Simulations Work
When you launch a simulation with a WebSocket agent, Coval:
- HTTP-first setup (if applicable) — If using HTTP-First mode, calls the HTTP endpoint first to get the WebSocket URL
- Establishes the connection — Opens a secure WebSocket connection to your endpoint (either the direct endpoint or the URL returned from HTTP setup)
- Sends initialization payload — If configured, sends the initialization JSON immediately after connection
- Waits for handshake (if configured) — Waits for a ready message from the agent if a ready message type is configured
- Exchanges audio — Sends audio chunks and receives audio responses according to the persona behavior, including attached test-case images when relevant
- Records the transcript — Captures the full conversation for evaluation
- Runs metrics — Evaluates the conversation against your configured metrics
- Closes the connection — Cleanly terminates the WebSocket connection
Coval sends and expects audio messages in JSON format or raw binary, with the following behavior:
Outbound Audio (Coval → Your Agent)
- By default, Coval sends audio as JSON like
{"type": "audio_chunk", "data": "..."}.
- You can customize that payload under Send Audio Template.
- If the template is just
{{audio_data}}, Coval sends raw bytes.
- Audio data is base64-encoded when using JSON templates.
Inbound Audio (Your Agent → Coval)
- By default, Coval looks for incoming JSON audio messages where
type is audio_chunk and the audio bytes are in data.
- Incoming audio can also be raw binary PCM when your integration sends binary frames.
audio_encoding controls how incoming audio is decoded: pcm by default, or mp3.
Technical Requirements
Running Locally
If your WebSocket server is running locally, you’ll need to expose it publicly for Coval to connect. Use a tunneling service like ngrok:
ngrok http 8080
# Use the generated wss:// URL as your endpoint
# Note: if ngrok gives you an https:// tunnel URL, use the corresponding wss:// URL in Coval
Remember to update your agent configuration when your tunnel URL changes. Consider using ngrok’s reserved domains for persistent URLs.
Troubleshooting
Connection Failures
“Invalid WebSocket URL” error
- Verify your endpoint starts with
wss:// (not ws://, http://, or https://)
- Check that the URL is properly formatted with no trailing spaces
“Connection refused” error
- Ensure your WebSocket server is running and accessible
- Check firewall rules allow inbound connections on the WebSocket port
- Verify the endpoint URL is correct
“Authentication failed” error
- Confirm your authorization header value is correct
- Check that the header format matches what your server expects
- Verify the API key or token hasn’t expired
Audio Quality Issues
“Audio quality degraded” or “Playback issues”
- Check your Send Sample Rate and Receive Sample Rate settings match what your agent expects
- Verify your agent can handle the configured sample rates (8,000 / 16,000 / 24,000 / 48,000 Hz)
- Ensure audio encoding (
pcm vs mp3) is supported by your agent
Sample rate mismatch
- If your agent expects 48,000 Hz but you set Send Sample Rate to 16,000 Hz, audio quality may be affected
- If your agent sends audio at 8,000 Hz but you set Receive Sample Rate to 48,000 Hz, audio quality may be affected
Message Handling Issues
“No response received” error
- Ensure your agent sends audio responses in the configured JSON format with
type and data fields
- Check that your agent is processing and responding to incoming audio messages
- Verify there are no errors in your agent’s logs
“Invalid JSON in response” error
- Confirm your agent returns valid JSON for audio messages
- Check for proper encoding of special characters
- Ensure the
type and data fields are properly formatted
“Agent receives unreadable audio payload”
- Verify your audio template matches your agent’s expected format
- Ensure audio encoding (
pcm vs mp3) is supported by your agent
- Check that audio data is properly base64-encoded when using JSON templates
- For raw PCM frames, verify the format matches what Coval expects
“Agent receives unreadable media payload”
- If your agent expects JSON, use a JSON
send_media_template
- If your agent expects raw bytes, set
send_media_template to exactly {{media_data}}
- Remember that
{{media_data}} is base64-encoded when used inside a JSON template
- Include
{{mime_type}} and {{media_name}} if your agent needs file metadata
Timeout Issues
Simulation timeouts
- Verify your agent responds within a reasonable time
- Check for any blocking operations in your agent’s message handler
- Monitor your agent’s resource usage during simulations
Best Practices
- Use persistent connections — Voice WebSocket agents should maintain the connection throughout the conversation
- Handle reconnection gracefully — If your agent supports it, configure automatic reconnection
- Log initialization payloads — Track the initialization JSON in your server logs for debugging
- Implement health checks — Add a ping/pong mechanism to detect connection issues early
- Secure your endpoint — Always use
wss:// and implement proper authentication
- Match sample rates — Set Send and Receive sample rates to match your agent’s capabilities