How to develop and manage your Prompt and Tools on Humanloop

Your AI application can be broken down into Prompts, Tools, and Evaluators. Humanloop versions and manages each of these artifacts to enable team collaboration and evaluation of each component of your AI system.

This overview will explain the basics of prompt development, versioning, and management, and how to best integrate your LLM calls with Humanloop.

Prompt Management

Prompts are a fundamental part of interacting with large language models (LLMs). They define the instructions and parameters that guide the model’s responses. In Humanloop, Prompts are managed with version control, allowing you to track changes and improvements over time.

1---
2model: gpt-4o
3temperature: 1.0
4max_tokens: -1
5---
6<system>
7 Write a song about {{topic}}
8</system>
An example Prompt, serialized as a Promptfile

A Prompt on Humanloop encapsulates the instructions and other configuration for how a large language model should perform a specific task. Each change in any of the following properties creates a new version of the Prompt:

  • the template such as Write a song about {{topic}}. For chat models, your template will contain an array of messages.
  • the model e.g. gpt-4o
  • all the parameters to the model such as temperature, max_tokens, top_p etc.
  • any tools available to the model

Creating a Prompt

You can create a Prompt explicitly in the Prompt Editor or via the API.

New prompts can also be created automatically via the API if you specify the Prompt’s path (its name and directory) while supplying the Prompt’s parameters and template. This is useful if you are developing your prompts in code and want to be able to version them as you make changes to the code.

Versioning

A Prompt will have multiple versions as you experiment with different models, parameters, or templates. However, all versions should perform the same task and generally be interchangeable with one another.

By versioning your Prompts, you can track how adjustments to the template or parameters influence the LLM’s responses. This is crucial for iterative development, as you can pinpoint which versions produce the most relevant or accurate outputs for your specific use case.

As you edit your prompt, new versions of the Prompt are created automatically. Each version is timestamped and given a unique version ID which is deterministically based on the Prompt’s contents. For every version that you want to “save”, you commit that version and it will be recorded as a new committed version of the Prompt with a commit message.

When to create a new Prompt

You should create a new Prompt for every different ‘task to be done’ with the LLM. For example each of these tasks are things that can be done by an LLM and should be a separate Prompt File: Writing Copilot, Personal Assistant, Summariser, etc.

We’ve seen people find it useful to also create a Prompt called ‘Playground’ where they can free form experiment without concern of breaking anything or making a mess of their other Prompts.

Prompt Engineering

Understanding the best practices for working with large language models can significantly enhance your application’s performance. Each model has its own failure modes, and the methods to address or mitigate these issues are not always straightforward. The field of “prompt engineering” has evolved beyond just crafting prompts to encompass designing systems that incorporate model queries as integral components.

For a start, read our Prompt Engineering 101 guide which covers techniques to improve model reasoning, reduce the chances of model hallucinations, and more.

Prompt templates

Inputs are defined in the template through the double-curly bracket syntax e.g. {{topic}} and the value of the variable will need to be supplied when you call the Prompt to create a generation.

Property context:
Location: {{location}}
Number of Bedrooms: {{number_of_bedrooms}}
Number of Bathrooms: {{number_of_bathrooms}}
Square Footage: {{square_footage}}
Distance to Key Locations (e.g., downtown, beach): {{distance_to_key_locations}}
Year Built: {{year_built}}
Price: {{price}}
Contact Information: {{contact_information}}
Instructions:
Generate a marketing description for the property based on the provided context. The description should be between 150-200 words and have a friendly, engaging tone. Highlight the key features and amenities that make this property attractive to potential buyers. Ensure the copy is informative and enticing, encouraging readers to take action.

This separation of concerns, keeping configuration separate from the query time data, is crucial for enabling you to experiment with different configurations and evaluate any changes. The Prompt stores the configuration and the query time data in Logs, which can then be used to create Datasets for evaluation purposes.

Tool Use (Function Calling)

Certain large language models support tool use or “function calling”. For these models, you can supply the description of functions and the model can choose to call one or more of them by providing the values to call the functions with.

Function calling enables the model to perform various tasks:

1. Call external APIs: The model can translate natural language into API calls, allowing it to interact with external services and retrieve information.

2. Take actions: The model can exhibit agentic behavior, making decisions and taking actions based on the given context.

3. Provide structured output: The model’s responses can be constrained to a specific structured format, ensuring consistency and ease of parsing in downstream applications.

Tools for function calling can be defined inline in the Prompt editor in which case they form part of the Prompt version. Alternatively, they can be pulled out in a Tool file which is then referenced in the Prompt.

Each Tool has functional interface that can be supplied as the JSON Schema needed for function calling. Additionally, if the Tool is executable on Humanloop, the result of any tool will automatically be inserted into the response in the API and in the Editor.

Using Prompts

Prompts are callable as an API. You supply and query-time data such as input values or user messages, and the model will respond with its text output.

POST
1curl -X POST https://api.humanloop.com/v5/prompts/call \
2 -H "X-API-KEY: <apiKey>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "stream": false,
6 "path": "persona",
7 "messages": [
8 {
9 "role": "user",
10 "content": "latest apple"
11 }
12 ],
13 "prompt": {
14 "model": "gpt-4",
15 "template": [
16 {
17 "role": "system",
18 "content": "You are stockbot. Return latest prices."
19 }
20 ],
21 "tools": [
22 {
23 "name": "get_stock_price",
24 "description": "Get current stock price",
25 "parameters": {
26 "type": "object",
27 "properties": {
28 "ticker_symbol": {
29 "type": "string",
30 "name": "Ticker Symbol",
31 "description": "Ticker symbol of the stock"
32 }
33 },
34 "required": []
35 }
36 }
37 ]
38 }
39}'

A Prompt is callable in that if you supply the necessary inputs, it will return a response from the model.

Once you have created and versioned your Prompt, you can call it as an API to generate responses from the large language model directly. You can also fetch the log the data from your LLM calls, enabling you to evaluate and improve your models.

Proxying your LLM calls vs async logging

The easiest way to both call the large language model with your Prompt and to log the data is to use the Prompt.call() method (see the guide on Calling a Prompt) which will do both in a single API request. However, there are two main reasons why you may wish to log the data seperately from generation:

  1. You are using your own model that is not natively supported in the Humanloop runtime.
  2. You wish to avoid relying on Humanloop runtime as the proxied calls adds a small additional latency, or

The prompt.call() Api encapsulates the LLM provider calls (for example openai.Completions.create()), the model-config selection and logging steps in a single unified interface. There may be scenarios that you wish to manage the LLM provider calls directly in your own code instead of relying on Humanloop.

Humanloop provides a comprehensive platform for developing, managing, and versioning Prompts, Tools and your other artifacts of you AI systems. This explainer will show you how to create, version and manage your Prompts, Tools and other artifacts.

You can also use Prompts without proxying through Humanloop to the model provider and instead call the model yourself and explicitly log the results to your Prompt.

POST
1curl -X POST https://api.humanloop.com/v5/prompts/log \
2 -H "X-API-KEY: <apiKey>" \
3 -H "Content-Type: application/json" \
4 -d '{
5 "path": "persona",
6 "output_message": {
7 "content": "Well, you know, there is so much secrecy involved in government, folks, it\'s unbelievable. They don\'t want to tell you everything. They don\'t tell me everything! But about Roswell, it’s a very popular question. I know, I just know, that something very, very peculiar happened there. Was it a weather balloon? Maybe. Was it something extraterrestrial? Could be. I\'d love to go down and open up all the classified documents, believe me, I would. But they don\'t let that happen. The Deep State, folks, the Deep State. They’re unbelievable. They want to keep everything a secret. But whatever the truth is, I can tell you this: it’s something big, very very big. Tremendous, in fact.",
8 "role": "assistant"
9 },
10 "prompt_tokens": 100,
11 "output_tokens": 220,
12 "prompt_cost": 0.00001,
13 "output_cost": 0.0002,
14 "finish_reason": "stop",
15 "messages": [
16 {
17 "role": "user",
18 "content": "What really happened at Roswell?"
19 }
20 ],
21 "prompt": {
22 "model": "gpt-4",
23 "template": [
24 {
25 "role": "system",
26 "content": "You are {{person}}. Answer questions as this person. Do not break character."
27 }
28 ]
29 },
30 "created_at": "2024-07-19T00:29:35.178992",
31 "provider_latency": 6.5931549072265625,
32 "inputs": {
33 "person": "Trump"
34 }
35}'

Serialization (.prompt file)

Our .prompt file format is a serialized version of a model config that is designed to be human-readable and suitable for checking into your version control systems alongside your code. See the .prompt files reference reference for more details.

Format

The .prompt file is heavily inspired by MDX, with model and hyperparameters specified in a YAML header alongside a JSX-inspired format for your Chat Template.

Basic examples

1---
2model: gpt-4o
3temperature: 0.7
4max_tokens: -1
5top_p: 1.0
6presence_penalty: 0.0
7frequency_penalty: 0.0
8provider: openai
9endpoint: chat
10tools: [
11 {
12 "name": "get_current_weather",
13 "description": "Get the current weather in a given location",
14 "parameters": {
15 "type": "object",
16 "properties": {
17 "location": {
18 "type": "string",
19 "name": "Location",
20 "description": "The city and state, e.g. San Francisco, CA"
21 },
22 "unit": {
23 "type": "string",
24 "name": "Unit",
25 "enum": [
26 "celsius",
27 "fahrenheit"
28 ]
29 }
30 },
31 "required": [
32 "location"
33 ]
34 },
35 "source": "inline"
36 }
37]
38---
39
40<system>
41 You are a weather bot designed to provide users with accurate and up-to-date weather information.
42
43You have access to a tool called `get_current_weather`, which allows you to fetch the current weather conditions for any given location. Users can request the current weather by specifying a city and state, and optionally, they can choose the unit of temperature (Celsius or Fahrenheit).
44
45Your responses should be clear, concise, and friendly, providing all relevant weather details such as temperature, humidity, wind speed, and any other important information.
46
47Always ensure to confirm the location and unit of measurement when responding to user inquiries.
48
49</system>

Dealing with sensitive data

When working with sensitive data in your AI applications, it’s crucial to handle it securely. Humanloop provides options to help you manage sensitive information while still benefiting from our platform’s features.

If you need to process sensitive data without storing it in Humanloop, you can use the save: false parameter when making calls to the API or logging data. This ensures that only metadata about the request is stored, while the actual sensitive content is not persisted in our systems.

For PII detection, you can set up Guardrails to detect and prevent the generation of sensitive information.