Create a Dataset

In this guide, we will walk through the different ways to create Datasets on Humanloop.

Datasets are a collection of input-output pairs that can be used to evaluate your Prompts, Tools or even Evaluators.

This guide will show you how to create Datasets in Humanloop in three different ways:

  • Create a Dataset from existing Logs - useful for curating Datasets based on how your AI application has been behaving in the wild.
  • Upload data from CSV - useful for quickly uploading existing tabular data you’ve collected outside of Humanloop.
  • Upload via API - useful for uploading more complex Datasets that may have nested JSON structures, which are difficult to represent in tabular .CSV format, and for integrating with your existing data pipelines.

Create a Dataset from Logs

Prerequisites

You should have an existing Prompt on Humanloop and already generated some Logs. Follow our guide on creating a Prompt.

Steps

To create a Dataset from existing Logs:

2

Select a subset of the Logs to add

Filter logs on a criteria of interest, such as the version of the Prompt used, then multi-select Logs.

In the menu in the top right of the page, select Add to dataset.

Filter and select logs of interest.
3

Add to a new Dataset

Provide a name of the new Dataset and click Create (or you can click add to existing Dataset to append the selection to an existing Dataset). Then provide a suitable commit message describing the datapoints you’ve added.

Create a new dataset from logs.

You will then see the new Dataset appear at the same level in the filesystem as your Prompt.

Upload a Dataset from CSV

Prerequisites

You should have an existing Prompt on Humanloop with a variable defined with our double curly bracket syntax {{variable}}. If not, first follow our guide on creating a Prompt.

In this example, we’ll use a Prompt that categorises user queries about Humanloop’s product and docs by which feature they relate to.

An example Prompt with a variable `{{query}}`.

Steps

To create a dataset from a CSV file, we’ll first create a CSV in Google Sheets that contains values for our Prompt variable {{query}} and then upload it to a Dataset on Humanloop.

1

Create a CSV file.

  • In our Google Sheets example below, we have a column called query which contains possible values for our Prompt variable {{query}}. You can include as many columns as you have variables in your Prompt template.
  • There is additionally a column called target which will populate the target output for the classifier Prompt. In this case, we use simple strings to define the target.
  • More complex Datapoints that contain messages and structured objects for targets are suppoerted, but are harder to incorporate into a CSV file as they tend to be hard-to-read JSON. If you need more complex Datapoints, use the API instead.
A CSV file in Google Sheets defining query and taget pairs for our Classifier Prompt.
2

Export the Google Sheet to CSV

In Google sheets, choose FileDownloadComma-separated values (.csv)

3

Create a new Dataset File

On Humanloop, select New at the bottom of the left hand sidebar, then select Dataset.

Creat a new File from the sidebar on Humanloop.
4

Click Upload CSV

First name your dataset when prompted in the sidebar, then select the Upload CSV button and drag and drop the CSV file you created above using the file explorer. You will then be prompted to provide a commit message to describe the initial state of the dataset.

Uploading a CSV file to create a dataset.

Upload a Dataset via API

Prerequisites

If you are using the SDK, the only prerequisite is to have the SDK installed and configured. If you are using the API directly, you will need to have an API key.

Steps

Using the API is a great way to integrate Humanloop with your existing data pipeline or just to once-off upload a more complex Dataset that is hard to represent in a CSV file, such as one that contains an array of messages and JSON targets.

1

Post data to the Datasets API

We first define some sample data that contains user messages and desired responses from our Support Agent Prompt and call the POST /datasets endpoint to upload it as follows:

2

Inspect the uploaded Dataset

After running this code, in your Humanloop workspace you will now see a Dataset called Support Query Ground Truth (or whatever value was in path) with your sample data.

Inspect the Dataset uploaded via API.

Next steps

🎉 Now that you have Datasets defined in Humanloop, you can leverage our Evaluations feature to systematically measure and improve the performance of your AI applications. See our guides on setting up Evaluators and Running an Evaluation to get started.