Tool Call Metrics
Metrics Overview
This system evaluates the performance of an assistant by analyzing tool calls made during interactions with users. The evaluation is based on specific metrics that measure the appropriateness and correctness of these tool calls. Below are the metrics used in the system:
Appropriate Tool Call
This metric assesses whether the assistant selected the appropriate tool call (e.g., getItemInfo
, addItemToOrder
, removeItem
) based on the user’s query and the conversation context. The correctness of the tool call parameters is not evaluated here—only the appropriateness of the tool call in relation to fulfilling the user’s request.
Purpose: Ensures that the assistant’s choice of tool call is relevant to the user’s needs and the current state of the conversation.
Example: If the user asked to add an item, the assistant should correctly choose the addItemToOrder
tool call.
Correct Tool Call Parameters
This metric evaluates whether the assistant provided the correct parameters (arguments) when making a tool call. For example, it checks if the assistant added the correct size, side options, or other item details as specified by the user during the conversation.
Purpose: Ensures that the tool call accurately reflects the user’s order specifications.
Example: If the user requested a “medium combo with onion rings and an Oreo shake,” the assistant should pass these exact parameters in the addItemToOrder
tool call.
Correct Removal of Item
This metric checks whether the assistant correctly identified and removed the appropriate item from the user’s order as requested. It ensures that the assistant understands which item to remove based on the context of the conversation and the user’s specific instructions.
Purpose: Validates that the assistant is correctly handling item removals, which is critical in maintaining an accurate order.
Example: If the user asked to remove “the large fries,” the assistant should accurately remove that specific item, not another one by mistake.
Accurate Modification of Item
This metric evaluates whether any requested modifications to an item in the user’s order were applied correctly. It looks at updates to item attributes such as size, combo options, or other details. If the tool call was not for modifying an item, this metric is marked as passed.
Purpose: Ensures that modifications to existing items in the order (e.g., changing the size or combo options) are executed accurately and without errors.
Example: If the user asked to change the drink size from “medium” to “large” for an existing item in the order, the assistant should correctly apply the modification.
Was this page helpful?