November 17, 2023

LLM Evaluators

Until now, it’s been possible to trigger LLM-based evaluations by writing Python code that uses the Humanloop API to trigger the LLM generations.

Today, in order to make this increasingly important workflow simpler and more intuitive, we’re releasing LLM Evaluators, which require no Python configuration.

From the Evaluations page, click New Evaluator and select LLM Evaluator.

You can now choose between the existing Python Evaluators and our new LLM Evaluators.

Instead of a code editor, the right hand side of the page is now a prompt editor for defining instructions to the LLM Evaluator. Underneath the prompt, you can configure the parameters of the Evaluator (things like model, temperature etc.) just like any normal model config.

LLM Evaluator Editor.

In the prompt editor, you have access to a variety of variables that correspond to data from the underlying Log that you are trying to evaluate. These use the usual {{ variable }} syntax, and include:

  • log_inputs - the input variables that were passed in to the prompt template when the Log was generated
  • log_prompt - the fully populated prompt (if it was a completion mode generation)
  • log_messages - a JSON representation of the messages array (if it was a chat mode generation)
  • log_output - the output produced by the model
  • log_error - if the underlying Log was an unsuccessful generation, this is the error that was produced
  • testcase - when in offline mode, this is the testcase that was used for the evaluation.

Take a look at some of the presets we’ve provided on the left-hand side of the page for inspiration.

LLM Evaluator presets. You'll likely need to tweak these to fit your use case.

At the bottom of the page you can expand the debug console - this can be used verify that your Evaluator is working as intended. We’ve got further enhancements coming to this part of the Evaluator Editor very soon.

Since an LLM Evaluator is just another model config managed within Humanloop, it gets its own project. When you create an LLM Evaluator, you’ll see that a new project is created in your organisation with the same name as the Evaluator. Every time the Evaluator produces a Log as part of its evaluation activity, that output will be visible in the Logs tab of that project.

Improved evaluator editor

Given our current focus on delivering a best-in-class evaluations experience, we’ve promoted the Evaluator editor to a full-page screen in the app.

In the left-hand pane, you’ll find drop-downs to:

  • Select the mode of the Evaluator - either Online or Offline, depending on whether the Evaluator is intended to run against pre-defined testcases or against live production Logs
  • Select the return type of the Evaluator - either boolean or number

Underneath that configuration you’ll find a collection of presets.

Preset selector.