Upsert Dataset

Create a Dataset or update it with a new version if it already exists.

Datasets are identified by the ID or their path. The datapoints determine the versions of the Dataset.

By default, the new Dataset version will be set to the list of Datapoints provided in the request. You can also create a new version by adding or removing Datapoints from an existing version by specifying action as add or remove respectively. In this case, you may specify the version_id or environment query parameters to identify the existing version to base the new version on. If neither is provided, the latest created version will be used.

You can provide version_name and version_description to identify and describe your versions. Version names must be unique within a Dataset - attempting to create a version with a name that already exists will result in a 409 Conflict error.

Humanloop also deduplicates Datapoints. If you try to add a Datapoint that already exists, it will be ignored. If you intentionally want to add a duplicate Datapoint, you can add a unique identifier to the Datapoint’s inputs such as {_dedupe_id: <unique ID>}.

Query parameters

version_idstringOptional

ID of the specific Dataset version to base the created Version on. Only used when action is "add" or "remove".

environmentstringOptional

Name of the Environment identifying a deployed Version to base the created Version on. Only used when action is "add" or "remove".

include_datapointsbooleanOptionalDefaults to false

If set to true, include all Datapoints in the response. Defaults to false. Consider using the paginated List Datapoints endpoint instead.

Request

This endpoint expects an object.

datapointslist of objectsRequired

The Datapoints to create this Dataset version with. Modify the action field to determine how these Datapoints are used.

pathstringOptional

Path of the Dataset, including the name. This locates the Dataset in the Humanloop filesystem and is used as as a unique identifier. For example: folder/name or just name.

idstringOptional

ID for an existing Dataset.

actionenumOptional

The action to take with the provided Datapoints.

If "set", the created version will only contain the Datapoints provided in this request.
If "add", the created version will contain the Datapoints provided in this request in addition to the Datapoints in the target version.
If "remove", the created version will contain the Datapoints in the target version except for the Datapoints provided in this request.

If "add" or "remove", one of the version_id or environment query parameters may be provided.

Allowed values:

attributesmap from strings to anyOptional

Additional fields to describe the Dataset. Helpful to separate Dataset versions from each other with details on how they were created or used.

version_namestringOptional

Unique name for the Dataset version. Version names must be unique for a given Dataset.

version_descriptionstringOptional

Description of the version, e.g., the changes made in this version.

Response

Successful Response

pathstring

Path of the Dataset, including the name, which is used as a unique identifier.

idstring

Unique identifier for the Dataset. Starts with ds_.

namestring

Name of the Dataset, which is used as a unique identifier.

version_idstring

Unique identifier for the specific Dataset Version. If no query params provided, the default deployed Dataset Version is returned. Starts with dsv_.

created_atdatetime

updated_atdatetime

last_used_atdatetime

datapoints_countinteger

The number of Datapoints in this Dataset version.

directory_idstring or null

ID of the directory that the file is in on Humanloop.

descriptionstring or null

Description of the Dataset.

schemamap from strings to any or null

The JSON schema for the File.

readmestring or null

Long description of the file.

tagslist of strings or null

List of tags associated with the file.

type"dataset" or nullDefaults to dataset

environmentslist of objects or null

The list of environments the Dataset Version is deployed to.

created_byany or null

The user who created the Dataset.

version_namestring or null

Unique name for the Dataset version. Version names must be unique for a given Dataset.

version_descriptionstring or null

Description of the version, e.g., the changes made in this version.

datapointslist of objects or null

The list of Datapoints in this Dataset version. Only provided if explicitly requested.

attributesmap from strings to any or null

Additional fields to describe the Dataset. Helpful to separate Dataset versions from each other with details on how they were created or used.

1	curl -X POST https://api.humanloop.com/v5/datasets \
2	-H "X-API-KEY: <apiKey>" \
3	-H "Content-Type: application/json" \
4	-d '{
5	"datapoints": [
6	{
7	"inputs": {
8	"question": "What is the capital of France?"
9	},
10	"target": {
11	"answer": "Paris"
12	}
13	},
14	{
15	"inputs": {
16	"question": "Who wrote Hamlet?"
17	},
18	"target": {
19	"answer": "William Shakespeare"
20	}
21	}
22	],
23	"path": "test-questions",
24	"action": "set",
25	"version_name": "test-questions-v1",
26	"version_description": "Add two new questions and answers"
27	}'

1	{
2	"path": "test-questions",
3	"id": "ds_mno345",
4	"name": "test-questions",
5	"version_id": "dsv_pqr678",
6	"created_at": "2024-05-01T12:00:00Z",
7	"updated_at": "2024-05-01T12:00:00Z",
8	"last_used_at": "2024-05-01T12:00:00Z",
9	"datapoints_count": 4,
10	"type": "dataset"
11	}

Headers

Query parameters

Request

Response

Errors