January 11, 2024

Evaluation API enhancements

We’ve started the year by enhancing our evaluations API to give you more flexibility for self-hosting whichever aspects of the evaluation workflow you need to run in your own infrastructure - while leaving the rest to us!

Mixing and matching the Humanloop-runtime with self-hosting

Conceptually, evaluation runs have two components:

  1. Generation of logs for the datapoints using the version of the model you are evaluating.
  2. Evaluating those logs using Evaluators.

Now, using the Evaluations API, Humanloop offers the ability to generate logs either within the Humanloop runtime, or self-hosted (see our guide on external generations for evaluations).

Similarly, evaluating of the logs can be performed in the Humanloop runtime (using evaluators that you can define in-app), or self-hosted (see our guide on self-hosted evaluations).

It is now possible to mix-and-match self-hosted and Humanloop-runtime logs and evaluations in any combination you wish.

When creating an Evaluation (via the improved UI dialogue or via the API), you can set the new hl_generated flag to False to indicate that you are posting the logs from your own infrastructure. You can then also include an evaluator of type External to indicate that you will post evaluation results from your own infrastructure.

You can now also include multiple evaluators on any run, and these can include a combination of External (i.e. self-hosted) and Humanloop-runtime evaluators.