Self-hosted evaluations
Self-hosted evaluations
Self-hosted evaluations
For some use cases, you may wish to run your evaluation process outside of Humanloop, as opposed to running the evaluators we offer in our Humanloop runtime.
For example, you may have implemented an evaluator that uses your own custom model or which has to interact with multiple systems. In these cases, you can continue to leverage the datasets you have curated on Humanloop, as well as consolidate all of the results alongside the prompts you maintain in Humanloop.
In this guide, we’ll show an example of setting up a simple script to run such a self-hosted evaluation using our Python SDK.
After this step, you’ll see a new run in the Humanloop app, under the Evaluations tab of your project. It should have status running.
After running this script with the appropriate resource IDs (project, dataset, model config), you should see the results in the Humanloop app, right alongside any other evaluations you have performed using the Humanloop runtime.
