Agent Skills - Coval Documentation

Agent Skills are modular knowledge packages that teach your AI coding agent how to evaluate effectively. They follow the open Agent Skills standard and work with Claude Code, Cursor, Windsurf, Codex, and 40+ other agents.

Install

npx skills add coval-ai/coval-external-skills

This installs all Coval skills into your agent’s skills directory. Skills are loaded on demand — only the name and description are in memory until activated.

Skills vs MCP vs CLI

	Skills	MCP Server	CLI
What it provides	Knowledge (how to evaluate well)	Tools (execute operations)	Operations (run from terminal)
Install	`npx skills add coval-ai/coval-external-skills`	`npx coval-mcp`	`brew install coval-ai/tap/coval`
Use when	Agent needs to design evaluations	Agent needs to run evaluations natively	Scripting, CI/CD, any terminal
Works with	Any agent supporting skills	MCP-compatible clients	Any shell environment

We recommend Skills + CLI for the most complete experience. Skills teach your agent what to create, and the CLI executes it with structured JSON output.

Available Skills

Onboarding

onboard

Interactive guided setup for your first evaluation. Walks through connecting an agent, creating personas, building test cases, selecting metrics, and launching a run.

Runs

Skill	Description
launch-run	Launch an evaluation run against an AI agent
watch-run	Monitor a run’s progress with live status updates
quick-eval	Full workflow — launch, watch, and summarize results in one go

Simulations

Skill	Description
get-results	Retrieve and analyze simulation results from a run
download-audio	Download audio recordings from voice simulations

Resources

Skill	Description
coval-resources	Complete reference for all Coval resources, their hierarchy, relationships, API endpoints, and ID formats

Dashboards

Skill	Description
create-dashboard	Create a new dashboard and populate it with metric widgets
add-widget	Add a chart, table, or text widget to a dashboard
manage-dashboard	Get, update, or delete a dashboard
manage-widgets	List, update, resize, or delete widgets
list-dashboards	List all dashboards with filtering

Test Cases

Skill	Description
huggingface-import	Import datasets from HuggingFace and convert them to Coval test sets

Migrations

Skill	Description
migrate-bluejay	Migrate configuration from Bluejay voice AI testing platform to Coval

Human Review

Skill	Description
review-llm-annotations-and-improve-prompt	Calculate agreement between human and machine labels, then propose improved metric prompts

How Skills Work

Skills use progressive disclosure to stay lightweight:

At startup (~100 tokens per skill): Only the name and description are loaded
When activated (under 5000 tokens): The full skill instructions load when your agent detects a relevant task
On demand: Reference files (templates, examples) load only when needed

This means having all Coval skills installed adds minimal overhead to your agent’s context.

Skill Structure

Each skill follows the Agent Skills spec:

skill-name/
├── SKILL.md          # Instructions (required)
├── references/       # Templates, detailed docs (optional)
├── scripts/          # Executable code (optional)
└── assets/           # Static resources (optional)

Source Code

All skills are open source: github.com/coval-ai/coval-external-skills

AI Agents

Documentation Index

​Install

​Skills vs MCP vs CLI

​Available Skills

​Onboarding

onboard

​Runs

​Simulations

​Resources

​Dashboards

​Test Cases

​Migrations

​Human Review

​How Skills Work

​Skill Structure

​Source Code