How to Generate a Test Set
Quick Start
- Enter your test scenario in the input box.
-
(Optional) Add extra context:
- Attach files (such as text, JSON, or markdown)
- Choose an agent to evaluate
- Pick a relevant category from those suggested
-
(Optional) Add metadata:
- Define metadata fields to extract from each test case.
- Example: key: “ticket_number”, description: “X-###” will generate entries like “X-001” per test case
- Example: key: “destination”, description: “enter a possible airport code the user is flying to” will generate entries like “SFO”
- Submit using the arrow button or by pressing Enter.
- Review and modify your test set in the test set editor.
Alternative Options
- Upload from file: Use “Upload from file” to import CSV/Excel test cases
- Manual mode: Use “Use manual creation mode” to create a blank test set and add cases yourself
Uploading from CSV/Excel
Import test cases in bulk by uploading a properly formatted CSV or Excel file.Column Structure
The test case input or prompt. This column is case-insensitive and must be present in your file.
Expected behaviors for the test case. Parsing rules (applied during test-set ingest/validation):
- JSON array:
["behavior1", "behavior2"]- parsed as an array of behaviors - Comma-separated string:
"behavior1,behavior2"- split by comma into multiple behaviors - Single string:
"behavior1"- treated as a single behavior string
Test case type. Case-insensitive. Accepts:
SCENARIOTRANSCRIPT
JSON object containing test case metadata
Agent IDs to associate with the test set. Test-set level (applies to all test cases):
- JSON array of strings:
["agent-id-1", "agent-id-2"]- parsed as an array of agent ID strings - Comma-separated string:
"agent-id-1,agent-id-2" - Values are trimmed and empty values are filtered out
- Uses the first non-empty value found in the file (since
agent_idsapplies to the whole test-set)
Knowledge base entries to attach to test cases:
- JSON array of objects:
[{"id": "entry-id-1", "type": "web_url"}, {"id": "entry-id-2"}]- each object can have
id(required) andtype(optional)
- each object can have
- JSON array of strings:
["entry-id-1", "entry-id-2"]- treated as entry IDs with default type - Comma-separated string:
"entry-id-1:web_url,entry-id-2,entry-id-3:pdf"- Each object can be formatted as ‘id:type’ or just ‘id’ with a default type
- Single string:
"entry-id-1"- treated as an entry ID with a default type - Accepts:
web_url(default)plain_textjsonzendeskshelffile
Any additional column headers will automatically be treated as metadata fields
File Requirements
Your file must meet the following criteria:- Accepted formats:
.csvor.xlsx - Maximum file size: 10MB
- First row: Must contain column headers (case-insensitive)
- Empty rows: Automatically skipped during import
- Validation: Rows with empty input values are filtered out
Understanding Test Cases
Test Case Input
Each test case uses one of three input types that determine how the simulated user behaves during a run:| Type | What it is | Simulated user behavior |
|---|---|---|
| Scenario | High-level intent | Improvises freely toward the goal |
| Transcript | A reference conversation | Adapts as needed to match the flow |
| Script | Exact turns | Follows them precisely, word for word |
1. Scenarios
Define specific tasks or behaviors for your simulated user. Use quotation marks for exact phrases you want them to say. Examples:- Simple task: “Call to get a refund”
- Complex scenario: “First, ask for PTO from the 21st to the 22nd of March. After receiving a confirmation, ask to change to the 20th to 22nd. During the verification, share your email address as ‘emily [at] gmail [dot] com’. Then, proceed to correct yourself with ‘oh no - it’s actually emily [dot] marc [at] gmail [dot] com’.”
2. Transcript
Recreate specific conversations using OpenAI transcript format. The agent will follow the user’s part of the transcript as closely as possible. Format example:3. Audio Upload
Upload a pre-recorded audio file containing the persona’s side of the conversation (right channel) to use during a voice simulation. Instead of the persona generating responses with an LLM and TTS, the uploaded audio plays back exactly as recorded — making the test fully deterministic. Supported formats:.wav, .mp3 (max 200 MB, duration 5 seconds – 1 hour).
How it works:
- In the test set editor, select Audio as the input type and upload your audio file containing the persona’s speech (right channel only)
- The file is played back during simulation in place of LLM-generated persona speech
- The uploaded audio is automatically transcribed so persona turns still appear in the transcript
- After the audio finishes playing, the simulation waits 30 seconds for the agent to finish responding, then ends the call
Ground Truth Transcript
To measure your agent’s STT accuracy against a known-correct transcript of the uploaded audio, you can provide a ground truth transcript in two ways: Via the UI — when uploading an audio file, the modal includes a ground truth transcript field where you can either paste the transcript as plain text or upload a.txt or .json file.
Via metadata — add a ground_truth_transcript key to the test case metadata directly.
Either method enables the STT Word Error Rate (Audio Upload) metric, which compares your agent’s speech-to-text output against this reference text. The ground truth can be plain text, labeled text with timestamps and role labels, or a JSON object with a messages array.
4. Script
Define an ordered list of exact lines for the persona to deliver, turn by turn. The persona follows the script exactly rather than generating responses with an LLM — while still using the configured persona voice and background sounds. Example script turns:- “Hi, I’d like to check my account balance.”
- “Yes, my account number is 12345.”
- “Thank you, goodbye.”
- In the test set editor, select Script as the input type
- Add ordered turn texts in the script editor (each turn is one persona utterance)
- During simulation, the persona delivers each line in order instead of generating LLM responses
- A divergence detector monitors agent responses — if the agent diverges significantly from the expected flow, the simulation can end early with a
SCRIPT_DIVERGEDreason - After the last scripted turn is delivered, the agent gets one final response before the simulation ends with a
SCRIPT_COMPLETEDreason
5. Image Attachment
Attach a single image to a test case so the persona can share it during a WebSocket voice simulation. This is useful for flows like sending a receipt, damage photo, insurance card, or product image after the agent asks for visual proof or context. Supported formats:.png, .jpg, .jpeg (max 2 MB, one image per test case).
Before using image attachments, make sure the test set is attached to a WebSocket voice agent. The image will only be sent when this test set is used with that agent type.
How it works:
- In the test set editor, open a test case and click Add Media.
- Upload a PNG or JPEG image and give it a short Name such as
receipt_photoorbroken_screen. - Optionally add a Description telling the persona when the image should be sent.
- Attach the test set to a WebSocket voice agent with a media send template configured.
- Launch the run using that attached WebSocket voice agent.
- During the conversation, Coval can send the image when the agent asks for relevant visual information.
Image attachments augment a normal test case input rather than replacing it. You still define the scenario, transcript, script, or audio flow as usual, and the image is available as an additional artifact the persona can send when needed.
- Use short, stable names like
receipt_photoordrivers_license_front. - Use the description to explain when to send the image, not just what the file contains.
- Keep the image tightly scoped to the task so the agent receives only the evidence it needs.
Test-Case Specific Evaluation
Expected Behavior and Metadata allow you to utilize test-case specific data to evaluate how the agent responds to a specific test case.Test Case Expected Behavior
The expected behavior dictates how your agent should be responding to the user’s requests. Example- “the agent should ask the user for their phone number”
- “the agent should repeat the phone number back to the users”
Test Case Metadata
These fields can be used to store specific metadata about a test case. This is helpful when you want to create a metric that might reference a specific aspect of the test case. You can input as key/value pairs, or as a JSON. Example: Imagine an airline help desk where the test case contains this metadata{{test_case.destination}}?
Recommended Test Set Types
For comprehensive testing, create multiple types of test sets:
- Regression Set: Contains “happy path” scenarios representing typical successful interactions
- Adversarial Set: Contains edge cases and scenarios designed to test your agent’s limits and handling of unusual requests
Utilizing Agent Attributes
In your agents, you can set specific attributes associated with that agent. You can embed these agent attributes into your scenarios with this format:{{agent.attribute_name}}
Example:
Imagine one agent has the attribute location with a value “San Francisco”, and another agent has the value “London”.
Embed those agent attributes in your scenarios and expected behaviors like this:
Scenario: You are a user calling for travel recommendations in {{agent.location}}
Expected Behavior: The agent should only give travel recommendations in {{agent.location}}
Test Cases vs. Personas
- Persona: Defines how to behave
- Characteristics (friendly, angry)
- Voice configuration
- Can be assigned multiple test cases
- Test Case: Defines what to do
- Specific tasks or scenarios
- Can be assigned to any persona

