Create Evaluation

POST

Create an Evaluation.

Create a new Evaluation by specifying the Dataset, versions to be evaluated (Evaluatees), and which Evaluators to provide judgments.

Humanloop will automatically start generating Logs and running Evaluators where orchestrated=true. If you own the runtime for the Evaluatee or Evaluator, you can set orchestrated=false and then generate and submit the required logs using your runtime.

To keep updated on the progress of the Evaluation, you can poll the Evaluation using the GET /evaluations/:id endpoint and check its status.

Request

This endpoint expects an object.
datasetobjectRequired

Dataset to use in this Evaluation.

evaluatorslist of objectsRequired

The Evaluators used to evaluate.

evaluateeslist of objectsOptional

Unique identifiers for the Prompt/Tool Versions to include in the Evaluation. Can be left unpopulated if you wish to add Evaluatees to this Evaluation by specifying evaluation_id in Log calls.

namestringOptional

Name of the Evaluation to help identify it. Must be unique within the associated File.

fileobjectOptional

The File to associate with the Evaluation.

Response

This endpoint returns an object.
idstring

Unique identifier for the Evaluation. Starts with evr.

datasetobject

The Dataset used in the Evaluation.

evaluateeslist of objects

The Prompt/Tool Versions included in the Evaluation.

evaluatorslist of objects

The Evaluator Versions used to evaluate.

statusenum
Allowed values: pendingrunningcompletedcancelled

The current status of the Evaluation.

  • "pending": The Evaluation has been created but is not actively being worked on by Humanloop.
  • "running": Humanloop is checking for any missing Logs and Evaluator Logs, and will generate them where appropriate.
  • "completed": All Logs an Evaluator Logs have been generated.
  • "cancelled": The Evaluation has been cancelled by the user. Humanloop will stop generating Logs and Evaluator Logs.
created_atdatetime
updated_atdatetime
namestringOptional

Name of the Evaluation to help identify it. Must be unique among Evaluations associated with File.

file_idstringOptional

Unique identifier for the File associated with the Evaluation.

created_byanyOptional
urlstringOptional

URL to view the Evaluation on the Humanloop.

Errors