Spot-check your Logs | Humanloop Docs

By regularly reviewing a sample of your Prompt Logs, you can gain valuable insights into the performance of your Prompts in production, such as through reviews by subject-matter experts (SMEs).

For real-time observability (typically using code Evaluators), see our guide on setting up monitoring. This guide describes setting up more detailed evaluations which are run on a small subset of Logs.

Prerequisites

You have a Prompt with Logs. See our guide on logging to a Prompt if you don’t yet have one.
You have a Human Evaluator set up. See our guide on creating a Human Evaluator if you don’t yet have one.

Install and initialize the SDK

First you need to install and initialize the SDK. If you have already done this, skip to the next section.

Open up your terminal and follow these steps:

Install the Humanloop SDK:

1 pip install humanloop

Initialize the SDK with your Humanloop API key (you can get it from the Organization Settings page).

1 from humanloop import Humanloop
2 humanloop = Humanloop(api_key="<YOUR HUMANLOOP KEY>")
3 
4 # Check that the authentication was successful
5 print(humanloop.prompts.list())

Set up an Evaluation

Create an Evaluation

Create an Evaluation for the Prompt. In this example, we also attach a “rating” Human Evaluator so our SMEs can judge the generated responses.

1 evaluation = humanloop.evaluations.create(
2     # Name your Evaluation
3     name="Monthly spot-check",
4     file={
5         # Replace this with the ID of your Prompt.
6         # You can specify a Prompt by "path" as well.
7         "id": "pr_..."
8     },
9     evaluators=[
10         # Attach Evaluator to enable SMEs to rate the generated responses
11         {"path": "Example Evaluators/Human/rating"},
12     ],
13 )

Create a Run

Create a Run within the Evaluation. We will then attach Logs to this Run.

1 run = humanloop.evaluations.create_run(
2     id=evaluation.id,
3 )

Sample Logs

Sample a subset of your Logs to attach to the Run.

For this example, we’ll sample 100 Logs from the past 30 days, simulating a monthly spot-check.

1 import datetime
2 
3 logs = humanloop.logs.list(
4     file_id="pr_...",  # Replace with the ID of the Prompt
5     sample=100,
6     # Example filter to sample Logs from the past 30 days
7     start_date=datetime.datetime.now() - datetime.timedelta(days=30),
8 )
9 
10 log_ids = [log.id for log in logs]

Attach Logs to the Run

Attach the sampled Logs to the Run you created earlier.

1 humanloop.evaluations.add_logs_to_run(
2     id=evaluation.id,
3     run_id=run.id,
4     log_ids=log_ids,
5 )

You have now created an Evaluation Run with a sample of Logs attached to it. In the Humanloop app, go to the Prompt’s Evaluations tab. You should see the new Evaluation named “Monthly spot-check”. Click on it to view the Run with the Logs attached.

Evaluation Run with Logs attached

Review your Logs

Rate the model generations via the Review tab.

For further details on how you can manage reviewing your Logs with multiple SMEs, see our guide on managing multiple reviewers.

Logs review

After your Logs have been reviewed, go to the Stats tab to view aggregate stats.

Repeating the spot-check

To repeat this process the next time a spot-check is due, you can create a new Run within the same Evaluation, repeating the above steps from “Create a Run”. You will then see the new Run alongside the previous ones in the Evaluation, and can compare the aggregate stats across multiple Runs.

Next Steps

If you have performed a spot-check and identified issues, you can iterate on your Prompts in the app and run further Evaluations to verify improvements.