Create a dataset
Datasets can be created from existing logs or uploaded from CSV and via the API.
You can currently create Datasets in Humanloop in three ways: from existing logs, by uploading a CSV or via the API.
Create a Dataset from Logs
Prerequisites:
To create a Dataset from existing Logs:
Upload data from CSV
Prerequisites:
- A Prompt in Humanloop
To create a dataset from a CSV file, we’ll first create a CSV in Google Sheets and then upload it to a dataset in Humanloop.
Create a CSV file.
- In our Google Sheets example below, we have a column called
user_query
which is an input to a prompt variable of that name. So in our model config, we’ll need to include{{ user_query }}
somewhere, and that placeholder will be populated with the value from theuser_query
input in the datapoint at generation-time. - You can include as many columns of prompt variables as you need for your model configs.
- There is additionally a column called
target
which will populate the target of the datapoint. In this case, we use simple strings to define the target. - Note:
messages
are harder to incorporate into a CSV file as they tend to be verbose and hard-to-read JSON. If you want a dataset with messages, consider using the API to upload, or convert from existing logs.
Upload via API
Install and initialize the SDK
TypeScript
Python
First you need to install and initialize the SDK. If you have already done this, skip to the next section. Otherwise, open up your terminal and follow these steps:
-
Install the Humanloop TypeScript SDK:
-
Import and initialize the SDK:
On the datasets tab in your Humanloop project you will now see the dataset you just uploaded via the API.