Upload a Dataset from CSV
In this guide, we will walk through creating a Dataset on Humanloop from a CSV.
Datasets are a collection of input-output pairs that can be used to evaluate your Prompts, Tools or even Evaluators.
Prerequisites
You should have an existing Prompt on Humanloop with a variable defined with our double curly bracket syntax {{variable}}
. If not, first follow our guide on creating a Prompt.
In this example, we’ll use a Prompt that categorises user queries about Humanloop’s product and docs by which feature they relate to.
Steps
To create a dataset from a CSV file, we’ll first create a CSV in Google Sheets that contains values for our Prompt variable {{query}}
and then upload it to a Dataset on Humanloop.
Create a CSV file.
- In our Google Sheets example below, we have a column called
query
which contains possible values for our Prompt variable{{query}}
. You can include as many columns as you have variables in your Prompt template. - There is additionally a column called
target
which will populate the target output for the classifier Prompt. In this case, we use simple strings to define the target. - More complex Datapoints that contain
messages
and structured objects for targets are supported, but are harder to incorporate into a CSV file as they tend to be hard-to-read JSON. If you need more complex Datapoints, use the API instead.
Export the Google Sheet to CSV
In Google Sheets, choose File → Download → Comma-separated values (.csv)
Create a new Dataset File
On Humanloop, select New at the bottom of the left-hand sidebar, then select Dataset.
Click Upload CSV
First name your dataset when prompted in the sidebar, then select the Upload CSV button and drag and drop the CSV file you created above using the file explorer. You will then be prompted to provide a commit message to describe the initial state of the dataset.
Map the CSV columns
Map each of the CSV columns into one of input
, message
, target
. To avoid uploading a column of your CSV you can map it to the exclude
option.
To map in columns to Messages, they need to be in a specific format. An example of this can be seen in our example Dataset or below:
Once you have mapped your columns, press Extend Current Dataset
Commit the dataset
Click the commit button at the top of the Dataset editor and fill in a commit message. Press Commit again.
Your dataset is now uploaded and ready for use.
Next steps
🎉 Now that you have Datasets defined in Humanloop, you can leverage our Evaluations feature to systematically measure and improve the performance of your AI applications. See our guides on setting up Evaluators and Running an Evaluation to get started.
For different ways to create datasets, see the links below:
- Create a Dataset from existing Logs - useful for curating Datasets based on how your AI application has been behaving in the wild.
- Upload via API - useful for uploading more complex Datasets that may have nested JSON structures, which are difficult to represent in tabular .CSV format, and for integrating with your existing data pipelines.