January

Aggregate statistics for Eval Runs

January 21st, 2025

We’ve added aggregate statistics to the Runs table to help you quickly compare performance across different Evaluators. You can view these statistics in the Runs tab of any Evaluation that contains Evaluators.

For boolean Evaluators, we show the percentage of true judgments. For number Evaluators, we display the average value. For select and multi-select Evaluators, we display a bar chart showing the distribution of the judgments.

Run stats with a tooltip showing breakdown for an Issues Evaluator

Run stats

Additional icons indicate the status of the Run, relevant to the aggregate stat:

  • A spinning icon indicates that not all Logs have judgments, and the Run is currently being executed. The displayed aggregate statistic may not be final.
  • A clock icon shows that not all Logs have judgments, though the Run is not currently being executed
  • A red warning icon indicates errors when running the Evaluator

Hover over these icons or aggregate statistics to view more details in the tooltip, such as the number of judgments and the number of errors (if any).

Select Eval Runs for comparison

January 21st, 2025

You can now more easily compare your relevant Runs by selecting them in the Runs tab.

To filter to a subset of Runs, go to the Runs tab and select them by clicking the checkbox or by pressing x with your cursor on the row. Then, go to the Stats or Review tab to see the comparison between the selected Runs. Your control Run will always be included in the comparison.

Selecting Runs on Runs tab

Select a Run in the table

Select Runs for comparison

In the review tab, the selected Run is displayed alongside the control Run.

Judgment filters in Review view

January 17th, 2025

You can now filter Logs by judgments in the Review tab of an Evaluation. This feature allows you to quickly retrieve specific Logs, such as those marked as “Good” or “Bad” by a subject-matter expert, or those with latency below a certain threshold.

Judgment filters

To filter Logs, click on the Filter button in the Review tab to set up your first filter.

Built with