For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inBook a demo
DocsReferenceChangelog
DocsReferenceChangelog
  • Getting Started
    • Overview
    • Quickstart
  • Explanation
    • Integrating Humanloop
  • Tutorials
    • Evaluate an Agent in the UI
    • Evaluate an Agent in code
    • Evaluate a RAG app
    • Capture user feedback
  • How-To Guides
    • Migrating from Humanloop
      • Run an Evaluation via the UI
      • Run an Evaluation via the API
      • Upload a Dataset from CSV
      • Create a Dataset via the API
      • Create a Dataset from existing Logs
      • Set up a code Evaluator
      • Set up LLM as a Judge
      • Set up a Human Evaluator
      • Run a Human Evaluation
      • Manage multiple reviewers
      • Compare and Debug Prompts
      • Set up CI/CD Evaluations
      • Spot-check your Logs
      • Use external Evaluators
      • Evaluate external logs
  • Reference
    • Deployment Options
    • Supported Models
    • Template Library
    • Vercel AI SDK
    • .prompt and .agent Files
    • Humanloop Runtime Environment
    • Security and Compliance
    • Data Management
    • Access roles (RBACs)
    • SSO and Authentication
    • LLMs.txt
LogoLogo
Sign inBook a demo
On this page
  • Prerequisites
  • Set up an Evaluation
  • Review your Logs
  • Repeating the spot-check
  • Next Steps
How-To GuidesEvaluation

Spot-check your Logs

How to create an Evaluation Run to review a sample of your Logs, ensuring your model generations remain high-quality.

Was this page helpful?
Previous

Use external Evaluators

Integrate your existing evaluation process with Humanloop.
Next
Built with

By regularly reviewing a sample of your Prompt Logs, you can gain valuable insights into the performance of your Prompts in production, such as through reviews by subject-matter experts (SMEs).

For real-time observability (typically using code Evaluators), see our guide on setting up monitoring. This guide describes setting up more detailed evaluations which are run on a small subset of Logs.

Prerequisites

  • You have a Prompt with Logs. See our guide on logging to a Prompt if you don’t yet have one.
  • You have a Human Evaluator set up. See our guide on creating a Human Evaluator if you don’t yet have one.
Install and initialize the SDK

First you need to install and initialize the SDK. If you have already done this, skip to the next section.

Open up your terminal and follow these steps:

  1. Install the Humanloop SDK:
1pip install humanloop
  1. Initialize the SDK with your Humanloop API key (you can get it from the Organization Settings page).
1from humanloop import Humanloop
2humanloop = Humanloop(api_key="<YOUR HUMANLOOP KEY>")
3
4# Check that the authentication was successful
5print(humanloop.prompts.list())

Set up an Evaluation

1

Create an Evaluation

Create an Evaluation for the Prompt. In this example, we also attach a “rating” Human Evaluator so our SMEs can judge the generated responses.

1evaluation = humanloop.evaluations.create(
2 # Name your Evaluation
3 name="Monthly spot-check",
4 file={
5 # Replace this with the ID of your Prompt.
6 # You can specify a Prompt by "path" as well.
7 "id": "pr_..."
8 },
9 evaluators=[
10 # Attach Evaluator to enable SMEs to rate the generated responses
11 {"path": "Example Evaluators/Human/rating"},
12 ],
13)
2

Create a Run

Create a Run within the Evaluation. We will then attach Logs to this Run.

1run = humanloop.evaluations.create_run(
2 id=evaluation.id,
3)
3

Sample Logs

Sample a subset of your Logs to attach to the Run.

For this example, we’ll sample 100 Logs from the past 30 days, simulating a monthly spot-check.

1import datetime
2
3logs = humanloop.logs.list(
4 file_id="pr_...", # Replace with the ID of the Prompt
5 sample=100,
6 # Example filter to sample Logs from the past 30 days
7 start_date=datetime.datetime.now() - datetime.timedelta(days=30),
8)
9
10log_ids = [log.id for log in logs]
4

Attach Logs to the Run

Attach the sampled Logs to the Run you created earlier.

1humanloop.evaluations.add_logs_to_run(
2 id=evaluation.id,
3 run_id=run.id,
4 log_ids=log_ids,
5)

You have now created an Evaluation Run with a sample of Logs attached to it. In the Humanloop app, go to the Prompt’s Evaluations tab. You should see the new Evaluation named “Monthly spot-check”. Click on it to view the Run with the Logs attached.

Evaluation Run with Logs attached

Review your Logs

Rate the model generations via the Review tab.

For further details on how you can manage reviewing your Logs with multiple SMEs, see our guide on managing multiple reviewers.

Logs review

After your Logs have been reviewed, go to the Stats tab to view aggregate stats.

Aggregate run stats

Repeating the spot-check

To repeat this process the next time a spot-check is due, you can create a new Run within the same Evaluation, repeating the above steps from “Create a Run”. You will then see the new Run alongside the previous ones in the Evaluation, and can compare the aggregate stats across multiple Runs.

Next Steps

  • If you have performed a spot-check and identified issues, you can iterate on your Prompts in the app and run further Evaluations to verify improvements.