Create a Dataset via the API

Datasets are a collection of input-output pairs that can be used to evaluate your Prompts, Tools or even Evaluators.

Prerequisites

If you are using the SDK, the only prerequisite is to have the SDK installed and configured. If you are using the API directly, you will need to have an API key.

Install and initialize the SDK

First you need to install and initialize the SDK. If you have already done this, skip to the next section.

Open up your terminal and follow these steps:

Install the Humanloop SDK:

1 pip install humanloop

Initialize the SDK with your Humanloop API key (you can get it from the Organization Settings page).

1 from humanloop import Humanloop
2 humanloop = Humanloop(api_key="<YOUR HUMANLOOP KEY>")
3 
4 # Check that the authentication was successful
5 print(humanloop.prompts.list())

Steps

Using the API is a great way to integrate Humanloop with your existing data pipeline or just to once-off upload a more complex Dataset that is hard to represent in a CSV file, such as one that contains an array of messages and JSON targets.

Post data to the Datasets API

We first define some sample data that contains user messages and desired responses from our Support Agent Prompt and call the POST /datasets endpoint to upload it as follows:

POST

/v5/datasets

1 curl -X POST https://api.humanloop.com/v5/datasets \
2      -H "X-API-KEY: <apiKey>" \
3      -H "Content-Type: application/json" \
4      -d '{
5   "datapoints": [
6     {
7       "messages": [
8         {
9           "role": "user",
10           "content": "How do i manage my organizations API keys?\n"
11         }
12       ],
13       "target": {
14         "response": "Hey, thanks for your questions. Here are steps for how to achieve: 1. Log in to the Humanloop Dashboard \n\n2. Click on \"Organization Settings.\"\n If you do not see this option, you might need to contact your organization admin to gain the necessary permissions.\n\n3. Within the settings or organization settings, select the option labeled \"API Keys\" on the left. Here you will be able to view and manage your API keys.\n\n4. You will see a list of existing API keys. You can perform various actions, such as:\n     - **Generate New API Key:** Click on the \"Generate New Key\" button if you need a new API key.\n     - **Revoke an API Key:** If you need to disable an existing key, find the key in the list and click the \"Revoke\" or \"Delete\" button.\n     - **Copy an API Key:** If you need to use an existing key, you can copy it to your clipboard by clicking the \"Copy\" button next to the key.\n\n5. **Save and Secure API Keys:** Make sure to securely store any new or existing API keys you are using. Treat them like passwords and do not share them publicly.\n\nIf you encounter any issues or need further assistance, it might be helpful to engage with an engineer or your IT department to ensure you have the necessary permissions and support.\n\nWould you need help with anything else?"
15       }
16     },
17     {
18       "messages": [
19         {
20           "role": "user",
21           "content": "Hey, can do I use my code evaluator for monitoring my legal-copilot prompt?"
22         }
23       ],
24       "target": {
25         "response": "Hey, thanks for your questions. Here are steps for how to achieve: 1. Navigate to your Prompt dashboard. \n 2. Select the `Monitoring` button on the top right of the Prompt dashboard \n 3. Within the model select the Version of the Evaluator you want to turn on for monitoring. \n\nWould you need help with anything else?"
26       }
27     }
28   ],
29   "path": "datasets/support-queries",
30   "version_name": "Initial version",
31   "version_description": "Add two new questions and answers"
32 }'

Inspect the uploaded Dataset

After running this code, in your Humanloop workspace you will now see a Dataset called Support Query Ground Truth (or whatever value was in path) with your sample data.

Next steps

🎉 Now that you have Datasets defined in Humanloop, you can leverage our Evaluations feature to systematically measure and improve the performance of your AI applications. See our guides on setting up Evaluators and Running an Evaluation to get started.

For different ways to create datasets, see the links below:

Create a Dataset from existing Logs - useful for curating Datasets based on how your AI application has been behaving in the wild.
Upload data from CSV - useful for quickly uploading existing tabular data you’ve collected outside of Humanloop.