Set up evaluations using API
In this guide, we'll walk through an example of using our API to create dataset and trigger an evaluation.
Paid Feature
This feature is not available for the Free tier. Please contact us if you wish to learn more about our Enterprise plan
API Options
This guide uses our Python SDK. All of the endpoints used are available in our TypeScript SDK and directly via the API.
Prerequisites:
Create evaluation
We’ll go through how to use the SDK in a Python script to set up a project, create a dataset and then finally trigger an evaluation.
Set up a project
Import Humanloop and set your Humanloop and OpenAI API keys.
Create a project and register your first model config
We’ll use OpenAI’s GPT-4 for extracting product feature names from customer queries in this example. The first model config created against the project is automatically deployed:
If you log onto your Humanloop account you will now see your project with a single model config defined:
![](https://fdr-prod-docs-files-public.s3.amazonaws.com/https://humanloop.docs.buildwithfern.com/docs/2024-07-19T19:46:40.702Z/assets/images/7a1a9ca-Screenshot_2023-08-12_at_15.15.27.png)
Create a dataset
Follow the steps in our guide to Upload a Dataset via API.
Now test your model manually by generating a log for one of the datapoints’ messages:
You can see from the output
field in the response that the model has done a good job at extracting the mentioned features in the desired json format:
Create an evaluator
Now that you have a project with a model config and a dataset defined, you can create an evaluator that will determine the success criteria for a log generated from the model using the target defined in the test datapoint.
Create an evaluator to determine if the extracted JSON is correct and test it against the generated log and the corresponding test datapoint:
Submit this evaluator to Humanloop
This means it can be used for future evaluations triggered via the UI or the API:
In your Humanloop project you will now see an evaluator defined:
![](https://fdr-prod-docs-files-public.s3.amazonaws.com/https://humanloop.docs.buildwithfern.com/docs/2024-07-19T19:46:40.702Z/assets/images/c395e8e-image.png)
Launch an evaluation
Launch an evaluation
You can now low against the model config using the dataset and evaluator. In practice you can include more than one evaluator:
Navigate to your Humanloop account to see the evaluation results. Initially it will be in a pending state, but will quickly move to completed given the small number of test cases. The datapoints generated by your model as part of the evaluation will also be recorded in your project’s logs table.
![](https://fdr-prod-docs-files-public.s3.amazonaws.com/https://humanloop.docs.buildwithfern.com/docs/2024-07-19T19:46:40.702Z/assets/images/dc52a7b-image.png)
Create evaluation - full script
Here is the full script you can copy and paste and run in your Python environment: