Create an Evaluation Run.
Optionally specify the Dataset and version to be evaluated.
Humanloop will automatically start generating Logs and running Evaluators where
orchestrated=true
. If you are generating Logs yourself, you can set orchestrated=false
and then generate and submit the required Logs via the API.
If dataset
and version
are provided, you can set use_existing_logs=True
to reuse existing Logs,
avoiding generating new Logs unnecessarily. Logs that are associated with the specified Version and have source_datapoint_id
referencing a datapoint in the specified Dataset will be associated with the Run.
To keep updated on the progress of the Run, you can poll the Run using
the GET /evaluations/{id}/runs
endpoint and check its status.
Unique identifier for Evaluation.
Dataset to use in this Run.
Version to use in this Run.
Whether the Run is orchestrated by Humanloop. If True
, Humanloop will generate Logs for the Run; dataset
and version
must be provided. If False
, a log for the Prompt/Tool should be submitted by the user via the API.
If True
, the Run will be initialized with existing Logs associated with the Dataset and Version. If False
, the Run will be initialized with no Logs. Can only be set to True
when both dataset
and version
are provided.
Successful Response
Unique identifier for the Run.
Whether the Run is orchestrated by Humanloop.
When the Run was added to the Evaluation.
When the Run was created.
The status of the Run.
Stats for other Runs will be displayed in comparison to the control Run.
The Dataset used in the Run.
The version used in the Run.
The User who created the Run.