February

New Tool creation flow

February 26th, 2024

You can now create Tools in the same way as you create Prompts and Directories. This is helpful as it makes it easier to discover Tools and easier to quickly create new ones.

To create a new Tool simply press the New button from the directory of your choice and select one of our supported Tools, such as JSON Schema tool for function calling or our Pinecone tool to integrate with your RAG pipelines.


Tool editor and deployments

February 26th, 2024

You can now manage and edit your Tools in our new Tool Editor. This is found in each Tool file and lets you create and iterate on your tools. As well, we have introduced deployments to Tools, so you can better control which versions of a tool are used within your Prompts.

Tool Editor

This replaces the previous Tools section which has been removed. The editor will let you edit any of the tool types that Humanloop supports (JSON Schema, Google, Pinecone, Snippet, Get API) and commit new Versions.

Deployment

Tools can now be deployed. You can pick a version of your Tool and deploy it. When deployed it can be used and referenced in a Prompt editor.

And example of this, if you have a version of a Snippet tool with the signature snippet(key) with a key/value pair of “helpful”/“You are a helpful assistant”. You decide you would rather change the value to say “You are a funny assistant”, you can commit a version of the Tool with the updated key. This wont affect any of your prompts that reference the Snippet tool until you Deploy the second version, after which each prompt will automatically start using the funny assistant prompt.


Prompt labels and hover cards

February 26th, 2024

We’ve rolled out a unified label for our Prompt Versions to allow you to quickly identify your Prompt Versions throughout our UI. As we’re rolling out these labels across the app, you’ll have a consistent way of interacting with and identifying your Prompt Versions.

Label and hover card for a deployed Prompt Version

The labels show the deployed status and short ID of the Prompt Version. When you hover over these labels, you will see a card that displays the commit message and authorship of the committed version.

You’ll be able to find these labels in many places across the app, such as in your Prompt’s deployment settings, in the Logs drawer, and in the Editor.

The Prompt Version label and hover card in a Prompt Editor

As a quick tip, the color of the checkmark in the label indicates that this is a version that has been deployed. If the Prompt Version has not been deployed, the checkmark will be black.

A Prompt Version that has not been deployed

Committing Prompt Versions

February 26th, 2024

Building on our terminology improvements from Project -> Prompt, we’ve now updated Model Configs -> Prompt Versions to improve consistency in our UI.

This is part of a larger suite of changes to improve the workflows around how entities are managed on Humanloop and to make them easier to work with and understand. We will also be following up soon with a new and improved major version of our API that encapsulates all of our terminology improvements.

In addition to just the terminology update, we’ve improved our Prompt versioning functionality to now use commits that can take commit messages, where you can describe how you’ve been iterating on your Prompts.

We’ve removed the need for names (and our auto-generated placeholder names) in favour of using explicit commit messages.

We’ll continue to improve the version control and file types support over the coming weeks.

Let us know if you have any questions around these changes!


Online evaluators for monitoring Tools

February 14th, 2024

You can now use your online evaluators for monitoring the logs sent to your Tools. The results of this can be seen in the graphs on the Tool dashboard as well as on the Logs tab of the Tool.

To enable Online Evaluations follow the steps seen in our Evaluate models online guide.


Logging token usage

February 14th, 2024

We’re now computing and storing the number of tokens used in both the requests to and responses from the model.

This information is available in the logs table UI and as part of the log response in the API. Furthermore you can use the token counts as inputs to your code and LLM based evaluators.

The number of tokens used in the request is called prompt_tokens and the number of tokens used in the response is called output_tokens.

This works consistently across all model providers and whether or not you are you are streaming the responses. OpenAI, for example, do not return token usage stats when in streaming mode.


Prompt Version authorship

February 13th, 2024

You can now view who authored a Prompt Version.

Prompt Version authorship in the Prompt Version slideover

We’ve also introduced a popover showing more Prompt Version details that shows when you mouseover a Prompt Version’s ID.

Prompt Version popover in the Logs slideover

Keep an eye out as we’ll be introducing this in more places across the app.


Filterable and sortable evaluations overview

February 9th, 2024

We’ve made improvements to the evaluations runs overview page to make it easier for your team to find interesting or important runs.

The charts have been updated to show a single datapoint per run. Each chart represents a single evaluator, and shows the performance of the prompt tested in that run, so you can see at a glance how the performance your prompt versions have evolved through time, and visually spot the outliers. Datapoints are color-coded by the dataset used for the run.

The table is now paginated and does not load your entire project’s list of evaluation runs in a single page load. The page should therefore load faster for teams with a large number of runs.

The columns in the table are now filterable and sortable, allowing you to - for example - filter just for the completed runs which test two specific prompt versions on a specific datasets, sorted by their performance under a particular evaluator.

Here, we've filtered the table on completed runs that tested three specific prompt versions of interest, and sorted to show those with the worst performance on the Valid JSON evaluator.

Projects rename and file creation flow

February 8th, 2024

We’ve renamed Projects to Prompts and Tools as part of our move towards managing Prompts, Tools, Evaluators and Datasets as special-cased and strictly versioned files in your Humanloop directories.

This is a purely cosmetic change for now. Your Projects (now Prompts and Tools) will continue to behave exactly the same. This is the first step in a whole host of app layout, navigation and API improvements we have planned in the coming weeks.

If you are curious, please reach out to learn more.

New creation flow

We’ve also updated our file creation flow UI. When you go to create projects you’ll notice they are called Prompts now.


Control logging level

February 2nd, 2024

We’ve added a save flag to all of our endpoints that generate logs on Humanloop so that you can control whether the request and response payloads that may contain sensitive information are persisted on our servers or not.

If save is set to false then no inputs, messages our outputs of any kind (including the raw provider request and responses) are stored on our servers. This can be helpful for sensitive use cases where you can’t for example risk PII leaving your system.

Details of the model configuration and any metadata you send are still stored. Therefore you can still benefit from certain types of evaluators such as human feedback, latency and cost, as well as still track important metadata over time that may not contain sensitive information.

This includes all our chat and completion endpoint variations, as well as our explicit log endpoint.

1from humanloop import Humanloop
2
3# You need to initialize the Humanloop SDK with your API Keys
4humanloop = Humanloop(api_key="<YOUR Humanloop API KEY>")
5
6# humanloop.complete_deployed(...) will call the active model config on your project.
7# You can optionally set the save flag to False
8complete_response = humanloop.complete_deployed(
9 save=False,
10 project="<YOUR UNIQUE PROJECT NAME>",
11 inputs={"question": "I have inquiry about by life insurance policy. Can you help?"},
12)
13
14# You can still retrieve the data_id and output as normal
15data_id = complete_response.data[0].id
16output = complete_response.data[0].output
17
18# And log end user feedback that will still be stored
19humanloop.feedback(data_id=data_id, type="rating", value="good")

Logging provider request

February 2nd, 2024

We’re now capturing the raw provider request body alongside the existing provider response for all logs generated from our deployed endpoints.

This provides more transparency into how we map our provider agnostic requests to specific providers. It can also effective for helping to troubleshoot the cases where we return well handled provider errors from our API.