Skip to main content

Prerequisites

1

Get Your API Key

Navigate to your Coval dashboard and generate an API key from Settings.
2

Add GitHub Secret

  1. Go to your repository Settings > Secrets and variables > Actions
  2. Click New repository secret
  3. Name: COVAL_API_KEY
  4. Value: Your Coval API key
3

Gather Required IDs

You’ll need the following identifiers:
  • Agent ID (22 chars): Found in Agents page → Select agent → Copy ID
  • Persona ID (22 chars): Found in Personas page → Select persona → Copy ID
  • Test Set ID (8 chars): Found in Test Sets page → Select test set → Copy ID
  • Metric IDs (22 chars each, optional): Found in Metrics page → Click metric → Copy ID

Quick Start

Automatic PR Checks

Create .github/workflows/coval-eval.yml:
name: Coval Evaluation

on:
  pull_request:
    branches: [main]

jobs:
  evaluate-agent:
    runs-on: ubuntu-latest
    steps:
      - name: Run Coval Evaluation
        uses: coval-ai/coval-github-action@v1
        env:
          COVAL_API_KEY: ${{ secrets.COVAL_API_KEY }}
        with:
          agent_id: "gk3jK9mPq2xRt5vW8yZaBc"
          persona_id: "hL4kL0nQr3ySt6vX9zAcDd"
          test_set_id: "aB1cD2eF"

Manual Workflow Dispatch

Create .github/workflows/manual-eval.yml:
name: Manual Evaluation

on:
  workflow_dispatch:
    inputs:
      agent_id:
        description: "Agent ID (22 characters)"
        required: true
        type: string
      persona_id:
        description: "Persona ID (22 characters)"
        required: true
        type: string
      test_set_id:
        description: "Test Set ID (8 characters)"
        required: true
        type: string

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - name: Run Evaluation
        uses: coval-ai/coval-github-action@v1
        env:
          COVAL_API_KEY: ${{ secrets.COVAL_API_KEY }}
        with:
          agent_id: ${{ inputs.agent_id }}
          persona_id: ${{ inputs.persona_id }}
          test_set_id: ${{ inputs.test_set_id }}
To trigger:
  1. Navigate to Actions tab
  2. Select Manual Evaluation
  3. Click Run workflow
  4. Enter your IDs and click Run workflow

Advanced Configuration

Custom Metrics and Options

- name: Advanced Evaluation
  uses: coval-ai/coval-github-action@v1
  env:
    COVAL_API_KEY: ${{ secrets.COVAL_API_KEY }}
  with:
    agent_id: "gk3jK9mPq2xRt5vW8yZaBc"
    persona_id: "hL4kL0nQr3ySt6vX9zAcDd"
    test_set_id: "aB1cD2eF"
    # Specific metrics to evaluate
    metric_ids: '["iM5lM1oRs4zTu7wY0aBdEe", "jN6mN2pSt5aUv8xZ1bCeFf"]'
    # Run each test case 3 times
    iteration_count: 3
    # Run 2 simulations concurrently
    concurrency: 2
    # Custom metadata for tracking
    metadata: '{"campaign": "q4_2025", "env": "staging"}'

Using Outputs

- name: Run Evaluation
  id: coval
  uses: coval-ai/coval-github-action@v1
  env:
    COVAL_API_KEY: ${{ secrets.COVAL_API_KEY }}
  with:
    agent_id: "gk3jK9mPq2xRt5vW8yZaBc"
    persona_id: "hL4kL0nQr3ySt6vX9zAcDd"
    test_set_id: "aB1cD2eF"

- name: Post Results
  run: |
    echo "Run ID: ${{ steps.coval.outputs.run_id }}"
    echo "Status: ${{ steps.coval.outputs.status }}"
    echo "View: ${{ steps.coval.outputs.run_url }}"

- name: Comment on PR
  if: github.event_name == 'pull_request'
  uses: actions/github-script@v7
  with:
    script: |
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.repo,
        body: '✓ Evaluation complete: ${{ steps.coval.outputs.run_url }}'
      })

Configuration Reference

Inputs

ParameterTypeRequiredDefaultDescription
agent_idstringYes-Agent to test (22 chars)
persona_idstringYes-Simulated persona (22 chars)
test_set_idstringYes-Test set with test cases (8 chars)
metric_idsJSON arrayNoAgent defaultsMetric IDs to evaluate (22 chars each)
iteration_countintegerNo1Runs per test case (1-10)
concurrencyintegerNo1Concurrent simulations (1-5)
metadataJSON objectNo{}Custom metadata for tracking
max_wait_timeintegerNo600Max wait time in seconds
check_intervalintegerNo30Status check interval in seconds

Outputs

OutputTypeDescription
run_idstringUnique run identifier
statusstringFinal status (COMPLETED, FAILED, etc.)
run_urlstringDashboard URL to view results

Environment Variables

VariableRequiredDescription
COVAL_API_KEYYesYour Coval API key

API Details

The action uses the Coval v1 Runs API:

Launch Run

Endpoint: POST https://api.coval.dev/v1/runs Request:
{
  "agent_id": "gk3jK9mPq2xRt5vW8yZaBc",
  "persona_id": "hL4kL0nQr3ySt6vX9zAcDd",
  "test_set_id": "aB1cD2eF",
  "metric_ids": ["iM5lM1oRs4zTu7wY0aBdEe"],
  "options": {
    "iteration_count": 3,
    "concurrency": 2
  },
  "metadata": {
    "campaign": "q4_2025"
  }
}
Response:
{
  "run": {
    "run_id": "8EktrIgaVxn9LfxkIynagX",
    "status": "PENDING",
    "create_time": "2025-10-14T12:00:00Z"
  }
}

Monitor Run

Endpoint: GET https://api.coval.dev/v1/runs/{run_id} Response:
{
  "run": {
    "run_id": "8EktrIgaVxn9LfxkIynagX",
    "status": "IN PROGRESS",
    "progress": {
      "total_test_cases": 10,
      "completed_test_cases": 5,
      "failed_test_cases": 0,
      "in_progress_test_cases": 1
    }
  }
}

Run Statuses

StatusDescription
PENDINGWaiting to start
IN QUEUEQueued for execution
IN PROGRESSRunning test cases
COMPLETEDSuccessfully completed
FAILEDRun failed

Examples

Environment-Based Testing

name: Multi-Environment Testing

on:
  push:
    branches: [main, staging, dev]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - name: Set Environment
        id: env
        run: |
          if [[ "${{ github.ref }}" == "refs/heads/main" ]]; then
            echo "agent=prodAgentId12345678" >> $GITHUB_OUTPUT
            echo "env=production" >> $GITHUB_OUTPUT
          elif [[ "${{ github.ref }}" == "refs/heads/staging" ]]; then
            echo "agent=stgAgentId123456789" >> $GITHUB_OUTPUT
            echo "env=staging" >> $GITHUB_OUTPUT
          else
            echo "agent=devAgentId123456789" >> $GITHUB_OUTPUT
            echo "env=development" >> $GITHUB_OUTPUT
          fi

      - name: Evaluate
        uses: coval-ai/coval-github-action@v1
        env:
          COVAL_API_KEY: ${{ secrets.COVAL_API_KEY }}
        with:
          agent_id: ${{ steps.env.outputs.agent }}
          persona_id: "hL4kL0nQr3ySt6vX9zAcDd"
          test_set_id: "aB1cD2eF"
          metadata: '{"env": "${{ steps.env.outputs.env }}", "commit": "${{ github.sha }}"}'

Parallel Persona Testing

name: Multi-Persona Testing

on:
  workflow_dispatch:

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        persona:
          - { id: "persona1234567890abcd", name: "Friendly" }
          - { id: "persona1234567890efgh", name: "Frustrated" }
          - { id: "persona1234567890ijkl", name: "Technical" }
    steps:
      - name: Test ${{ matrix.persona.name }}
        uses: coval-ai/coval-github-action@v1
        env:
          COVAL_API_KEY: ${{ secrets.COVAL_API_KEY }}
        with:
          agent_id: "gk3jK9mPq2xRt5vW8yZaBc"
          persona_id: ${{ matrix.persona.id }}
          test_set_id: "aB1cD2eF"
          metadata: '{"persona": "${{ matrix.persona.name }}"}'

Scheduled Regression Testing

name: Nightly Regression

on:
  schedule:
    - cron: '0 2 * * *'  # 2 AM daily

jobs:
  regression:
    runs-on: ubuntu-latest
    steps:
      - name: Run Tests
        uses: coval-ai/coval-github-action@v1
        env:
          COVAL_API_KEY: ${{ secrets.COVAL_API_KEY }}
        with:
          agent_id: "gk3jK9mPq2xRt5vW8yZaBc"
          persona_id: "hL4kL0nQr3ySt6vX9zAcDd"
          test_set_id: "regrTest"
          iteration_count: 5
          concurrency: 3
          max_wait_time: 1800

Troubleshooting

Invalid API Key

Status Code: 401
Error Code: UNAUTHENTICATED
Message: Invalid or missing API key
Solution: Verify COVAL_API_KEY is set correctly in GitHub Secrets.

Invalid Agent ID

Status Code: 400
Error Code: INVALID_ARGUMENT
Message: Invalid agent_id: Agent not found
Solution: Confirm the agent ID is 22 characters and exists in your organization.

Validation Errors

Status Code: 400
Details:
  - iteration_count: Value must be between 1 and 10
Solution: Ensure all parameters meet the constraints listed in the Configuration Reference.

Timeout

Solution: Increase max_wait_time for larger test sets or check the Coval dashboard for run status.

Invalid JSON

# Wrong - will fail
metric_ids: ["id1", "id2"]

# Correct - use single quotes around JSON
metric_ids: '["id1", "id2"]'

Resources