Spot-check your Logs

How to create an Evaluation Run to review a sample of your Logs, ensuring your model generations remain high-quality.

By regularly reviewing a sample of your Prompt Logs, you can gain valuable insights into the performance of your Prompts in production, such as through reviews by subject-matter experts (SMEs).

For real-time observability (typically using code Evaluators), see our guide on setting up monitoring. This guide describes setting up more detailed evaluations which are run on a small subset of Logs.

Prerequisites

First you need to install and initialize the SDK. If you have already done this, skip to the next section.

Open up your terminal and follow these steps:

  1. Install the Humanloop SDK:
  1. Initialize the SDK with your Humanloop API key (you can get it from the Organization Settings page).

Set up an Evaluation

1

Create an Evaluation

Create an Evaluation for the Prompt. In this example, we also attach a “rating” Human Evaluator so our SMEs can judge the generated responses.

2

Create a Run

Create a Run within the Evaluation. We will then attach Logs to this Run.

3

Sample Logs

Sample a subset of your Logs to attach to the Run.

For this example, we’ll sample 100 Logs from the past 30 days, simulating a monthly spot-check.

4

Attach Logs to the Run

Attach the sampled Logs to the Run you created earlier.

You have now created an Evaluation Run with a sample of Logs attached to it. In the Humanloop app, go to the Prompt’s Evaluations tab. You should see the new Evaluation named “Monthly spot-check”. Click on it to view the Run with the Logs attached.

Evaluation Run with Logs attached

Review your Logs

Rate the model generations via the Review tab.

For further details on how you can manage reviewing your Logs with multiple SMEs, see our guide on managing multiple reviewers.

Logs review

After your Logs have been reviewed, go to the Stats tab to view aggregate stats.

Aggregate run stats

Repeating the spot-check

To repeat this process the next time a spot-check is due, you can create a new Run within the same Evaluation, repeating the above steps from “Create a Run”. You will then see the new Run alongside the previous ones in the Evaluation, and can compare the aggregate stats across multiple Runs.

Next Steps