# Coval

> Coval is the reliability infrastructure for conversational AI agents. Evaluate voice and text AI agents by running simulated conversations and measuring performance with metrics. Supports inbound voice, outbound voice, chat, SMS, WebSocket, Pipecat, and LiveKit agent types.

## Getting Started

- [Welcome](https://docs.coval.dev/getting-started/welcome): What Coval does and who it's built for
- [Quick Start](https://docs.coval.dev/getting-started/quick-start): Set up your first evaluation in 5 minutes
- [GitHub Actions](https://docs.coval.dev/getting-started/github-actions-tutorial): CI/CD integration for automated evaluations on every PR

## Concepts

- [Agents](https://docs.coval.dev/concepts/agents/overview): Connect voice and chat AI agents to Coval for evaluation
- [Mutations](https://docs.coval.dev/concepts/agents/mutations): Test agent configuration variants side-by-side (A/B testing)
- [Attributes](https://docs.coval.dev/concepts/attributes/overview): Tag and filter resources with custom attributes
- [Personas](https://docs.coval.dev/concepts/personas/overview): Configure simulated callers with voice, accent, behavior, and background noise
- [Test Sets](https://docs.coval.dev/concepts/test-sets/overview): Define scenarios, transcripts, scripts, and expected behaviors for evaluation
- [Metrics](https://docs.coval.dev/concepts/metrics/overview): Measure agent performance with LLM judges, audio analysis, regex, and tool call checks
- [Built-in Metrics](https://docs.coval.dev/concepts/metrics/built-in-metrics): Pre-built metrics for latency, interruptions, sentiment, call resolution, and audio quality
- [Metric Prompting](https://docs.coval.dev/concepts/metrics/prompting): Writing effective LLM judge prompts and expected behaviors
- [Metric Chaining](https://docs.coval.dev/concepts/metrics/MetricChaining): Chain metrics for complex multi-step evaluation logic
- [Human Review](https://docs.coval.dev/concepts/metrics/human-review/human-review): Human-in-the-loop review for metric calibration
- [Templates](https://docs.coval.dev/concepts/templates/overview): Reusable evaluation configs bundling agent, test set, persona, and metrics
- [Simulations](https://docs.coval.dev/concepts/simulations/overview): Launch and analyze simulated conversations between your agent and personas
- [Multi-Run Analysis](https://docs.coval.dev/concepts/simulations/multi-run-analysis): Compare results across multiple evaluation runs
- [OpenTelemetry Traces](https://docs.coval.dev/concepts/simulations/traces/opentelemetry): Correlate evaluation results with production traces

## Agent Connections

- [Inbound Voice](https://docs.coval.dev/concepts/agents/connections/inbound-voice): Test agents that receive incoming phone calls
- [Outbound Voice](https://docs.coval.dev/concepts/agents/connections/outbound-voice): Test agents that make outbound calls
- [Chat (OpenAI Endpoint)](https://docs.coval.dev/concepts/agents/connections/openai-endpoint): Connect OpenAI-compatible chat APIs
- [Chat WebSocket](https://docs.coval.dev/concepts/agents/connections/chat-websocket): Text chat over persistent WebSocket connections
- [Pipecat](https://docs.coval.dev/concepts/agents/connections/pipecat): Integrate with Pipecat Cloud agents
- [LiveKit](https://docs.coval.dev/concepts/agents/connections/livekit): Real-time communication platform integration
- [WebSocket](https://docs.coval.dev/concepts/agents/connections/websocket): Generic WebSocket agent connection

## Observability

- [Dashboards](https://docs.coval.dev/concepts/dashboard/overview): Visualize performance trends across evaluation runs
- [Monitoring](https://docs.coval.dev/concepts/monitoring/overview): Evaluate production conversations with live monitoring
- [Improving Metrics with Human Review](https://docs.coval.dev/guides/improving-metrics-with-human-review): Calibrate metrics using human feedback loops

## Guides

- [Inbound Voice Simulations](https://docs.coval.dev/guides/simulations/inbound-voice): Step-by-step guide for inbound voice evaluations
- [Chat Simulations](https://docs.coval.dev/guides/simulations/chat): Step-by-step guide for chat agent evaluations
- [SMS Simulations](https://docs.coval.dev/guides/simulations/sms): Step-by-step guide for SMS agent evaluations
- [Outbound Voice](https://docs.coval.dev/guides/outbound-voice): Guide for outbound voice agent testing
- [API Keys](https://docs.coval.dev/guides/api-keys): Managing API keys for programmatic access
- [Scheduled Runs](https://docs.coval.dev/guides/scheduled-runs): Set up recurring automated evaluations
- [Webhooks](https://docs.coval.dev/guides/webhooks): Receive real-time notifications for run events
- [Observability](https://docs.coval.dev/guides/observability): OpenTelemetry traces and production monitoring setup
- [Human Review API](https://docs.coval.dev/guides/human-review-api): Human-in-the-loop review workflows via API

## CLI

- [Overview](https://docs.coval.dev/cli/overview): Command-line interface for evaluation, scripting, and CI/CD
- [Installation](https://docs.coval.dev/cli/installation): Install via Homebrew, Cargo, or binary download
- [Agents](https://docs.coval.dev/cli/agents): Create, list, update, delete agents
- [Runs](https://docs.coval.dev/cli/runs): Launch evaluations, watch progress, view results
- [Simulations](https://docs.coval.dev/cli/simulations): Inspect individual simulation results and download audio
- [Test Sets](https://docs.coval.dev/cli/test-sets): Manage test set collections
- [Test Cases](https://docs.coval.dev/cli/test-cases): Define evaluation inputs, expected outputs, and bulk import
- [Personas](https://docs.coval.dev/cli/personas): Configure simulated callers with voice and behavior
- [Metrics](https://docs.coval.dev/cli/metrics): Define scoring criteria (LLM judge, audio, regex, tool call)
- [Mutations](https://docs.coval.dev/cli/mutations): Manage agent configuration variants
- [Run Templates](https://docs.coval.dev/cli/run-templates): Save reusable evaluation configurations
- [Scheduled Runs](https://docs.coval.dev/cli/scheduled-runs): Schedule recurring evaluations with cron expressions
- [Dashboards](https://docs.coval.dev/cli/dashboards): Create dashboards and widgets from the CLI
- [API Keys](https://docs.coval.dev/cli/api-keys): Manage API keys
- [Human Review](https://docs.coval.dev/cli/human-review): Human review projects and annotations

## AI Agents

- [Evaluations for Agents](https://docs.coval.dev/agents/overview): Give AI coding agents the tools and knowledge to evaluate AI quality via Skills, MCP, CLI, or API
- [Guided Onboarding](https://docs.coval.dev/agents/onboarding): Run /onboard to set up a complete evaluation interactively — from connecting your agent to viewing results
- [Agent Skills](https://docs.coval.dev/agents/skills): Install evaluation expertise into your AI coding agent with one command (npx skills add coval-ai/coval-external-skills)

## MCP Server

- [Overview](https://docs.coval.dev/mcp/overview): Model Context Protocol server for LLM tool access
- [Installation](https://docs.coval.dev/mcp/installation): Set up MCP server for Claude, Cursor, and other clients
- [Tools](https://docs.coval.dev/mcp/tools): Available MCP tools reference
- [Beginner's Guide](https://docs.coval.dev/mcp/beginners-guide): Getting started with MCP and Coval

## API Reference

- [Introduction](https://docs.coval.dev/api-reference/v1/introduction): Authentication, base URL, pagination, filtering, and error codes
- [OpenAPI Specs](https://api.coval.dev/v1/openapi): Machine-readable API specifications (15 resource specs)

## Optional

- [Use Cases](https://docs.coval.dev/use-cases/overview): Example evaluation scenarios by industry
- [Leveraging Test Users](https://docs.coval.dev/use-cases/leveraging-test-users): Using test user data for better evaluations
- [Hackathons](https://docs.coval.dev/collaborate/hackathons/overview): Community events and collaboration