Create a dataset
Datasets can be created from existing logs or uploaded from CSV and via the API.
You can currently create Datasets in Humanloop in three ways: from existing logs, by uploading a CSV or via the API.
Create a Dataset from Logs
Prerequisites:
To create a Dataset from existing Logs:
Upload data from CSV
Prerequisites:
- A Prompt in Humanloop
To create a dataset from a CSV file, we’ll first create a CSV in Google Sheets and then upload it to a dataset in Humanloop.
Create a CSV file.
- In our Google Sheets example below, we have a column called
user_query
which is an input to a prompt variable of that name. So in our model config, we’ll need to include{{ user_query }}
somewhere, and that placeholder will be populated with the value from theuser_query
input in the datapoint at generation-time. - You can include as many columns of prompt variables as you need for your model configs.
- There is additionally a column called
target
which will populate the target of the datapoint. In this case, we use simple strings to define the target. - Note:
messages
are harder to incorporate into a CSV file as they tend to be verbose and hard-to-read JSON. If you want a dataset with messages, consider using the API to upload, or convert from existing logs.
Upload via API
On the datasets tab in your Humanloop project you will now see the dataset you just uploaded via the API.