Make the most out of the hackathon

Coval Prize:

Cash Prize: 500$

Challenge

Test your AI agents using your own Metrics and create dashboards for showing the results.

Use case Example: Airline AI Help Desk

Let’s see a concrete example of how you can use Coval to test an Airline AI Help Desk Agent Receptionist. You can adapt this example to your own hackathon use case.

Guide to test your AI agents using our web app

1

Create Account

See our Quickstart and create an account.

2

Create Test Set

There are 3 types of test sets that you can create:

  1. Transcript: Uploading a conversation (transcript) representating an user-agent interaction.

  2. Scenario: A natural language-scenario of a how an user might interact with your agent.

  3. Graph: Define a workflow as a graph to test your agent’s ability to follow a series of steps.

Let’s keep it simple for your first example and create a Scenario test set.

You can use the Generate More button to create more test cases.

3

Create Simulator

Now, let’s say you not only want to test your agent based on the input provided in the test set, but you would like to test it using variations of that test set. Additionally, you would like to provide a persona to the coval testing agent and give more details about the context and testing scrnario. To test your agent in a more realistic environment, you can create a Simulator.

Let’s go back to our example and create a simulator for our confused traveler coval user agent.

Find more about simulators here.

4

Create Metrics

Now, let’s create a metric to evaluate the performance of our agent. We can either use a Customizable Metric, use one of our pre-built metrics, or create our own. For this example, let’s create our own metric to evaluate if the user agent was able to receive information about the check-in process and use existing the Sentiment Analysis metric to evaluate the agent’s ability to communicate in a friendly and professional manner.

5

Launching Evaluation

Now, we’re one step away from testing our agent. Let’s launch the evaluation by pressing the Run Evaluation button on our test set and fill in the details for our evaluation. Don’t forget to select the simulator and metrics that we created.

6

Review Results

Now, let’s review the results of our evaluation. Here is an example of what the results might look like for the Sentiment Analysis metric. In the Scorecard tab you can check the results for the metric we have previously created.

Congratulations 🎉! You’ve just tested your agent.

Was this page helpful?