Skip to main content
Agent Skills are modular knowledge packages that teach your AI coding agent how to evaluate effectively. They follow the open Agent Skills standard and work with Claude Code, Cursor, Windsurf, Codex, and 40+ other agents.

Install

npx skills add coval-ai/coval-external-skills
This installs all Coval skills into your agent’s skills directory. Skills are loaded on demand — only the name and description are in memory until activated.

Skills vs MCP vs CLI

SkillsMCP ServerCLI
What it providesKnowledge (how to evaluate well)Tools (execute operations)Operations (run from terminal)
Installnpx skills add coval-ai/coval-external-skillsnpx coval-mcpbrew install coval-ai/tap/coval
Use whenAgent needs to design evaluationsAgent needs to run evaluations nativelyScripting, CI/CD, any terminal
Works withAny agent supporting skillsMCP-compatible clientsAny shell environment
We recommend Skills + CLI for the most complete experience. Skills teach your agent what to create, and the CLI executes it with structured JSON output.

Available Skills

Onboarding

onboard

Interactive guided setup for your first evaluation. Walks through connecting an agent, creating personas, building test cases, selecting metrics, and launching a run.

Runs

SkillDescription
launch-runLaunch an evaluation run against an AI agent
watch-runMonitor a run’s progress with live status updates
quick-evalFull workflow — launch, watch, and summarize results in one go

Simulations

SkillDescription
get-resultsRetrieve and analyze simulation results from a run
download-audioDownload audio recordings from voice simulations

Resources

SkillDescription
coval-resourcesComplete reference for all Coval resources, their hierarchy, relationships, API endpoints, and ID formats

Dashboards

SkillDescription
create-dashboardCreate a new dashboard and populate it with metric widgets
add-widgetAdd a chart, table, or text widget to a dashboard
manage-dashboardGet, update, or delete a dashboard
manage-widgetsList, update, resize, or delete widgets
list-dashboardsList all dashboards with filtering

Test Cases

SkillDescription
huggingface-importImport datasets from HuggingFace and convert them to Coval test sets

Migrations

SkillDescription
migrate-bluejayMigrate configuration from Bluejay voice AI testing platform to Coval

Human Review

SkillDescription
review-llm-annotations-and-improve-promptCalculate agreement between human and machine labels, then propose improved metric prompts

How Skills Work

Skills use progressive disclosure to stay lightweight:
  1. At startup (~100 tokens per skill): Only the name and description are loaded
  2. When activated (under 5000 tokens): The full skill instructions load when your agent detects a relevant task
  3. On demand: Reference files (templates, examples) load only when needed
This means having all Coval skills installed adds minimal overhead to your agent’s context.

Skill Structure

Each skill follows the Agent Skills spec:
skill-name/
├── SKILL.md          # Instructions (required)
├── references/       # Templates, detailed docs (optional)
├── scripts/          # Executable code (optional)
└── assets/           # Static resources (optional)

Source Code

All skills are open source: github.com/coval-ai/coval-external-skills