Spot-check your Logs

How to create an Evaluation Run to review a sample of your Logs, ensuring your model generations remain high-quality.

By regularly reviewing a sample of your Prompt Logs, you can gain valuable insights into the performance of your Prompts in production, such as through reviews by subject-matter experts (SMEs).

For real-time observability (typically using code Evaluators), see our guide on setting up monitoring. This guide describes setting up more detailed evaluations which are run on a small subset of Logs.

Prerequisites

First you need to install and initialize the SDK. If you have already done this, skip to the next section. Otherwise, open up your terminal and follow these steps:

  1. Install the Humanloop TypeScript SDK:

    $npm install humanloop
  2. Import and initialize the SDK:

    1import { HumanloopClient, Humanloop } from "humanloop";
    2
    3const humanloop = new HumanloopClient({ apiKey: "YOUR_API_KEY" });
    4
    5// Check that the authentication was successful
    6console.log(await humanloop.prompts.list());

Set up an Evaluation

1

Create an Evaluation

Create an Evaluation for the Prompt. In this example, we also attach a “rating” Human Evaluator so our SMEs can judge the generated responses.

1evaluation = humanloop.evaluations.create(
2 # Name your Evaluation
3 name="Monthly spot-check",
4 file={
5 # Replace this with the ID of your Prompt.
6 # You can specify a Prompt by "path" as well.
7 "id": "pr_..."
8 },
9 evaluators=[
10 # Attach Evaluator to enable SMEs to rate the generated responses
11 {"path": "Example Evaluators/Human/rating"},
12 ],
13)
2

Create a Run

Create a Run within the Evaluation. We will then attach Logs to this Run.

1run = humanloop.evaluations.create_run(
2 id=evaluation.id,
3)
3

Sample Logs

Sample a subset of your Logs to attach to the Run.

For this example, we’ll sample 100 Logs from the past 30 days, simulating a monthly spot-check.

1import datetime
2
3logs = humanloop.logs.list(
4 file_id="pr_...", # Replace with the ID of the Prompt
5 sample=100,
6 # Example filter to sample Logs from the past 30 days
7 start_date=datetime.datetime.now() - datetime.timedelta(days=30),
8)
9
10log_ids = [log.id for log in logs]
4

Attach Logs to the Run

Attach the sampled Logs to the Run you created earlier.

1humanloop.evaluations.add_logs_to_run(
2 id=evaluation.id,
3 run_id=run.id,
4 log_ids=log_ids,
5)

You have now created an Evaluation Run with a sample of Logs attached to it. In the Humanloop app, go to the Prompt’s Evaluations tab. You should see the new Evaluation named “Monthly spot-check”. Click on it to view the Run with the Logs attached.

Evaluation Run with Logs attached

Review your Logs

Rate the model generations via the Review tab.

For further details on how you can manage reviewing your Logs with multiple SMEs, see our guide on managing multiple reviewers.

Logs review

After your Logs have been reviewed, go to the Stats tab to view aggregate stats.

Aggregate run stats

Repeating the spot-check

To repeat this process the next time a spot-check is due, you can create a new Run within the same Evaluation, repeating the above steps from “Create a Run”. You will then see the new Run alongside the previous ones in the Evaluation, and can compare the aggregate stats across multiple Runs.

Next Steps

Built with