Upsert Dataset
Create a Dataset or update it with a new version if it already exists.
Datasets are identified by the ID
or their path
. The datapoints determine the versions of the Dataset.
By default, the new Dataset version will be set to the list of Datapoints provided in
the request. You can also create a new version by adding or removing Datapoints from an existing version
by specifying action
as add
or remove
respectively. In this case, you may specify
the version_id
or environment
query parameters to identify the existing version to base
the new version on. If neither is provided, the latest created version will be used.
You can provide version_name
and version_description
to identify and describe your versions.
Version names must be unique within a Dataset - attempting to create a version with a name
that already exists will result in a 409 Conflict error.
Humanloop also deduplicates Datapoints. If you try to add a Datapoint that already
exists, it will be ignored. If you intentionally want to add a duplicate Datapoint,
you can add a unique identifier to the Datapoint’s inputs such as {_dedupe_id: <unique ID>}
.
Headers
Query parameters
ID of the specific Dataset version to base the created Version on. Only used when action
is "add"
or "remove"
.
Name of the Environment identifying a deployed Version to base the created Version on. Only used when action
is "add"
or "remove"
.
If set to true
, include all Datapoints in the response. Defaults to false
. Consider using the paginated List Datapoints endpoint instead.
Request
The Datapoints to create this Dataset version with. Modify the action
field to determine how these Datapoints are used.
Path of the Dataset, including the name. This locates the Dataset in the Humanloop filesystem and is used as as a unique identifier. For example: folder/name
or just name
.
ID for an existing Dataset.
The action to take with the provided Datapoints.
- If
"set"
, the created version will only contain the Datapoints provided in this request. - If
"add"
, the created version will contain the Datapoints provided in this request in addition to the Datapoints in the target version. - If
"remove"
, the created version will contain the Datapoints in the target version except for the Datapoints provided in this request.
If "add"
or "remove"
, one of the version_id
or environment
query parameters may be provided.
Additional fields to describe the Dataset. Helpful to separate Dataset versions from each other with details on how they were created or used.
Unique name for the Dataset version. Version names must be unique for a given Dataset.
Description of the version, e.g., the changes made in this version.
Response
Successful Response
Path of the Dataset, including the name, which is used as a unique identifier.
Unique identifier for the Dataset. Starts with ds_
.
Name of the Dataset, which is used as a unique identifier.
Unique identifier for the specific Dataset Version. If no query params provided, the default deployed Dataset Version is returned. Starts with dsv_
.
The number of Datapoints in this Dataset version.
ID of the directory that the file is in on Humanloop.
Description of the Dataset.
Long description of the file.
The list of environments the Dataset Version is deployed to.
The user who created the Dataset.
Unique name for the Dataset version. Version names must be unique for a given Dataset.
Description of the version, e.g., the changes made in this version.
The list of Datapoints in this Dataset version. Only provided if explicitly requested.
Additional fields to describe the Dataset. Helpful to separate Dataset versions from each other with details on how they were created or used.