First you need to install and initialize the SDK. If you have already done this, skip to the next section.
Open up your terminal and follow these steps:
This quickstart will take you through running your first Eval with Humanloop.
You’ll learn how to trigger an evaluation from code, interpret an eval-report on Humanloop and use it to improve your AI features.
Add the following code in a file:
This sets up the basic structure of an Evaluation:
file argument defines the callable as well as the location of where the evaluation results will appear on Humanloop.It returns a checks object that contains the results of the eval per Evaluator.
Run your script with the following command:
You will see a URL to view your evals on Humanloop. A summary of progress and the final results will be displayed directly in your terminal:
Navigate to the URL provided in your terminal to see the result of running your script on Humanloop.
This Stats view will show you the live progress of your local eval runs as well summary statistics of the final results.
Each new run will add a column to your Stats view, allowing you to compare the performance of your LLM app over time.
The Logs and Review tabs allow you to drill into individual datapoints and view the outputs of different runs side-by-side to understand how to improve your LLM app.
Your first run resulted in a Semantic similarity score of 3 (out of 5) and an Exact match score of 0. Try and make a change to your callable to improve
the output and re-run your script. A second run will be added to your Stats view and the difference in performance will be displayed.
Now that you’ve run your first eval on Humanloop, you can: