For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Sign inBook a demo
DocsReferenceChangelog
DocsReferenceChangelog
  • Getting Started
    • Overview
    • Quickstart
  • Explanation
    • Integrating Humanloop
      • Files
      • Prompts
      • Agents
      • Evaluators
      • Tools
      • Flows
      • Datasets
      • Logs
      • Directories
      • Environments
  • Tutorials
    • Evaluate an Agent in the UI
    • Evaluate an Agent in code
    • Evaluate a RAG app
    • Capture user feedback
  • How-To Guides
    • Migrating from Humanloop
  • Reference
    • Deployment Options
    • Supported Models
    • Template Library
    • Vercel AI SDK
    • .prompt and .agent Files
    • Humanloop Runtime Environment
    • Security and Compliance
    • Data Management
    • Access roles (RBACs)
    • SSO and Authentication
    • LLMs.txt
LogoLogo
Sign inBook a demo
On this page
  • Versioning
  • Creating a Dataset
  • Using Datasets for Evaluations
ExplanationKey Concepts

Datasets

Datasets are collections of Datapoints used for evaluation and fine-tuning.

Was this page helpful?
Previous

Logs

Logs contain the inputs and outputs of each time a Function File is called.
Next
Built with

Datasets on Humanloop are collections of Datapoints used for evaluation and fine-tuning. You can think of a Datapoint as a test case for your AI application, which contains the following fields:

  • Inputs: a collection of prompt variable values that replace the {{variables}} defined in your prompt template during generation.
  • Messages: for chat models, you can have a history of chat messages that are appended to the prompt during generation.
  • Target: a value that in its simplest form describes the desired output string for the given inputs and messages history. For more advanced use cases, you can define a JSON object containing whatever fields are necessary to evaluate the model’s output.
A Datapoint slideover showing its inputs, messages, and target fields.

Versioning

A Dataset will have multiple Versions as you iterate on the test cases for your task. This tends to be an evolving process as you learn how your Prompts behave and how users interact with your AI application in the wild.

Dataset Versions are immutable and are uniquely defined by the contents of the Datapoints. When you change, add, or remove Datapoints, this constitutes a new Version.

Each Evaluation is linked to a specific Dataset Version, ensuring that your evaluation results are always traceable to the exact set of test cases used.

Creating a Dataset

A Dataset can be created in the following ways:

  • Upload a CSV in the UI
  • Create a Datapoint from an existing Log
  • Create a Dataset via the API

Using Datasets for Evaluations

Datasets are foundational for Evaluations on Humanloop. Evaluations are run by iterating over the Datapoints in a Dataset, generating output from different versions of your AI application for each one. The Datapoints provide the specific test cases to evaluate, with each containing the input variables and optionally a target output that defines the desired behavior. When a target is specified, Evaluators can compare the generated outputs to the targets to assess how well each version performed.

Run an Evaluation

Get started with using Datasets for Evaluation via UI

Run an Evaluation

Get started with using Datasets for Evaluation via code