August 14, 2024

Human Evaluator upgrades

We’ve made significant upgrades to Human Evaluators and related workflows to improve your ability to gather Human judgments (sometimes referred to as “feedback”) in assessing the quality of your AI applications.

Here are some of the key improvements:

  • Instead of having to define a limited feedback schema tied to the settings of a specific Prompt, you can now define your schema with a Human Evaluator file and reuse it across multiple Prompts and Tools for both monitoring and offline evaluation purposes.
  • You are no longer restricted to the default types of Rating, Actions and Issues when defining your feedback schemas from the UI. We’ve introduced a more flexible Editor interface supporting different return types and valence controls.
  • We’ve extended the scope of Human Evaluators so that they can now also be used with Tools and other Evaluators (useful for validating AI judgments) in the same way as with Prompts.
  • We’ve improved the Logs drawer UI for applying feedback to Logs. In particular, we’ve made the buttons more responsive.

To set up a Human Evaluator, create a new file. Within the file creation dialog, click on Evaluator, then click on Human. This will create a new Human Evaluator file and bring you to its Editor. Here, you can choose a Return type for the Evaluator and configure the feedback schema.

Tone evaluator set up with options and instructions

You can then reference this Human Evaluator within the Monitoring dropdown of Prompts, Tools, and other Evaluators, as well as when configuring reports in their Evaluations tab.

We’ve set up default Rating and Correction Evaluators that will be automatically attached to all Prompts new and existing. We’ve migrated all your existing Prompt specific feedback schemas to Human Evaluator files and these will continue to work as before with no disruption.

Check out our updated document for further details on how to use Human Evaluators: