Create an Evaluation.
Create a new Evaluation by specifying the Dataset, versions to be evaluated (Evaluatees), and which Evaluators to provide judgments.
Humanloop will automatically start generating Logs and running Evaluators where
orchestrated=true
. If you own the runtime for the Evaluatee or Evaluator, you
can set orchestrated=false
and then generate and submit the required logs using
your runtime.
To keep updated on the progress of the Evaluation, you can poll the Evaluation using
the GET /evaluations/:id
endpoint and check its status.
Dataset to use in this Evaluation.
The Evaluators used to evaluate.
Unique identifiers for the Prompt/Tool Versions to include in the Evaluation. Can be left unpopulated if you wish to add Evaluatees to this Evaluation by specifying evaluation_id
in Log calls.
Name of the Evaluation to help identify it. Must be unique within the associated File.
The File to associate with the Evaluation.
Unique identifier for the Evaluation. Starts with evr
.
The Dataset used in the Evaluation.
The Prompt/Tool Versions included in the Evaluation.
The Evaluator Versions used to evaluate.
The current status of the Evaluation.
"pending"
: The Evaluation has been created but is not actively being worked on by Humanloop."running"
: Humanloop is checking for any missing Logs and Evaluator Logs, and will generate them where appropriate."completed"
: All Logs an Evaluator Logs have been generated."cancelled"
: The Evaluation has been cancelled by the user. Humanloop will stop generating Logs and Evaluator Logs.Name of the Evaluation to help identify it. Must be unique among Evaluations associated with File.
Unique identifier for the File associated with the Evaluation.
URL to view the Evaluation on the Humanloop.