GuidesDatasets

Create a dataset

Datasets can be created from existing logs or uploaded from CSV and via the API.

You can currently create Datasets in Humanloop in three ways: from existing logs, by uploading a CSV or via the API.

Create a Dataset from Logs

Prerequisites:

  • A Prompt in Humanloop
  • Some Logs available in that Prompt

To create a Dataset from existing Logs:

Go to the Logs tab

Select a subset of the Logs

Choose Add to Dataset

In the menu in the top right of the page, select Add to dataset.

Select some logs and then click **Add to Dataset**

Add to a new or existing Dataset

Provide a name of the new dataset and click Create, or you can click add to existing dataset to append the selected to a dataset you already have.

Upload data from CSV

Prerequisites:

To create a dataset from a CSV file, we’ll first create a CSV in Google Sheets and then upload it to a dataset in Humanloop.

Create a CSV file.

  • In our Google Sheets example below, we have a column called user_query which is an input to a prompt variable of that name. So in our model config, we’ll need to include {{ user_query }} somewhere, and that placeholder will be populated with the value from the user_query input in the datapoint at generation-time.
  • You can include as many columns of prompt variables as you need for your model configs.
  • There is additionally a column called target which will populate the target of the datapoint. In this case, we use simple strings to define the target.
  • Note: messages are harder to incorporate into a CSV file as they tend to be verbose and hard-to-read JSON. If you want a dataset with messages, consider using the API to upload, or convert from existing logs.
A CSV file in Google Sheets defining a collection of 9 datapoints.

Export the Google Sheet to CSV

Choose FileDownloadComma-separated values (.csv)

Create a new Dataset File

Click Upload CSV

Uupload the CSV file from step 2 by drag-and-drop or using the file explorer.

Uploading a CSV file to create a dataset.

Click Upload Dataset from CSV

You should see a new dataset appear in the datasets tab. You can explore it by clicking in.

You’ll see a column with the input key-value pairs for each datapoint, a messages column (in our case we didn’t use messages, so they’re all empty) and a target column with the expected model output.

Upload via API

First define some sample data

This should consist of user messages and target extraction pairs. This is where you could load up any existing data you wish to use for your evaluation:

Python
1# Example test case data
2data = [
3 {
4 "messages": [
5 {
6 "role": "user",
7 "content": "Hi Humanloop support team, I'm having trouble understanding how to use the evaluations feature in your software. Can you provide a step-by-step guide or any resources to help me get started?",
8 }
9 ],
10 "target": {"feature": "evaluations", "issue": "needs step-by-step guide"},
11 },
12 {
13 "messages": [
14 {
15 "role": "user",
16 "content": "Hi there, I'm interested in fine-tuning a language model using your software. Can you explain the process and provide any best practices or guidelines?",
17 }
18 ],
19 "target": {
20 "feature": "fine-tuning",
21 "issue": "process explanation and best practices",
22 },
23 },
24]

Then define a dataset and upload the datapoints

Python
1# Create a dataset
2dataset = humanloop.datasets.create(
3 project_id=project_id,
4 name="Sample dataset",
5 description="Examples of featue requests extracted from user messages",
6)
7dataset_id = dataset.id
8
9# Create datapoints for the dataset
10datapoints = humanloop.datasets.create_datapoint(
11 dataset_id=dataset_id,
12 body=data,
13)

On the datasets tab in your Humanloop project you will now see the dataset you just uploaded via the API.