December

Improved Typescript SDK Evals

December 18th, 2024

We’ve enhanced our TypeScript SDK with an evaluation utility, similar to our Python SDK.

The utility can run evaluations on either your runtime or Humanloop’s. To use your local runtime, you need to provide:

  • A callable function that takes your inputs/ messages
  • A Dataset of inputs/ messages to evaluate the function against
  • A set of Evaluators to use to provide judgments on the outputs of your function

Here’s how our evals in code guide looks in the new Typescript SDK:

1import { Humanloop } from "humanloop";
2
3// Get API key at https://app.humanloop.com/account/api-keys
4const hl = new Humanloop({
5 apiKey: "<YOUR HUMANLOOP API KEY>",
6});
7
8
9const checks = hl.evaluations.run({
10 name: "Initial Test",
11 file: {
12 path: "Scifi/App",
13 callable: (messages: { role: string; content: string }[]) => {
14 // Replace with your AI model logic
15 const lastMessageContent = messages[messages.length - 1].content.toLowerCase();
16 return lastMessageContent === "hal"
17 ? "I'm sorry, Dave. I'm afraid I can't do that."
18 : "Beep boop!";
19 },
20 },
21 dataset: {
22 path: "Scifi/Tests",
23 // Replace with your own dataset
24 datapoints: [
25 {
26 messages: [
27 { role: "system", content: "You are an AI that responds like famous sci-fi AIs." },
28 { role: "user", content: "HAL" },
29 ],
30 target: {
31 output: "I'm sorry, Dave. I'm afraid I can't do that.",
32 },
33 },
34 {
35 messages: [
36 { role: "system", content: "You are an AI that responds like famous sci-fi AIs." },
37 { role: "user", content: "R2D2" },
38 ],
39 target: {
40 output: "Beep boop beep!",
41 },
42 },
43 ],
44 },
45 // Replace with your own Evaluators
46 evaluators: [
47 { path: "Example Evaluators/Code/Exact match" },
48 { path: "Example Evaluators/Code/Latency" },
49 { path: "Example Evaluators/AI/Semantic similarity" },
50 ],
51});
52
53console.log("Evaluation checks:", checks);
Check out this cookbook example to learn how to evaluate a RAG pipeline with the new SDK.

SDK Decorators in Typescript [beta]

December 17th, 2024

We’re excited to announce the beta release of our TypeScript SDK, aligning with the Python logging utilities we introduced last month.

The new utilities help you integrate Humanloop with minimal changes to your existing code base.

Take this basic chat agent instrumented through Humanloop:

1const callModel = (traceId: string, messages: MessageType[]) => {
2 const response = await openAIClient.chat.completions.create({
3 model: "gpt-4o",
4 temperature: 0.8,
5 messages: messages,
6 });
7
8 const output = response.choices[0].message.content || "";
9
10 await humanloop.prompts.log({
11 path: "Chat Agent/Call Model",
12 prompt: {
13 model: "gpt-4o",
14 messages: [...messages, { role: "assistant", content: output }],
15 temperature: 0.8,
16 },
17 traceParentId: traceId,
18 })
19
20 return output;
21}
22
23const chatAgent = () => {
24 const traceId = humanloop.flows.log(
25 path: "Chat Agent/Agent",
26
27 ).id
28 const messages = [{ role: "system", content: "You are a helpful assistant." }];
29
30 while (true) {
31 const userMessage = await getCLIInput();
32
33 if (userMessage === "exit") {
34 break;
35 }
36
37 messages.push({ role: "user", content: userMessage });
38 const response = await callModel(traceId, messages);
39 messages.push({ role: "assistant", content: response });
40 }
41
42 humanloop.flows.updateLog(
43 traceId,
44 { traceStatus: "complete", messages: messages }
45 )
46}

Using the new logging utilities, the SDK will automatically manage the Files and logging for you. Through them you can integrate Humanloop to your project with less changes to your existing codebase.

Calling a function wrapped in an utility will create a Log on Humanloop. Furthermore, the SDK will detect changes to the LLM hyperparameters and create a new version automatically.

The code below is equivalent to the previous example:

1const callModel = (messages: MessageType[]) =>
2 humanloop.prompt({
3 path: "Chat Agent/Call Model",
4 callable: async (inputs: any, messages: MessageType[]) => {
5 const response = await openAIClient.chat.completions.create({
6 model: "gpt-4o",
7 temperature: 0.8,
8 messages: messages,
9 });
10
11 return response.choices[0].message.content || "";
12 },
13 })(undefined, messages);
14
15const chatAgent = () =>
16 humanloop.flow({
17 path: "Chat Agent/Agent",
18 callable: async (inputs: any, messages: MessageType[]) => {
19 const messages = [{ role: "system", content: "You are a helpful assistant." }];
20
21 while (true) {
22 const userMessage = await getCLIInput();
23
24 if (userMessage === "exit") {
25 break;
26 }
27
28 messages.push({ role: "user", content: userMessage });
29 const response = await callModel(messages);
30 messages.push({ role: "assistant", content: response });
31 }
32
33 return messages;
34 },
35 })(undefined, []);

This release introduces three decorators:

  • flow(): Serves as the entry point for your AI features. Use it to call other decorated functions and trace your feature’s execution.

  • prompt(): Monitors LLM client library calls to version your Prompt Files. Supports OpenAI, Anthropic, and Replicate clients. Changing the provider or hyperparameters creates a new version in Humanloop.

  • tool(): Versions tools using their source code. Includes a jsonSchema decorated to streamline function calling.

Explore our cookbook example to see a simple chat agent instrumented with the new logging utilities.

Function-calling AI Evaluators

December 15th, 2024

We’ve updated our AI Evaluators to use function calling by default, improving their reliability and performance. We’ve also updated the AI Evaluator Editor to support this change.

AI Evaluator Editor with function calling

New AI Evaluators will now use function calling by default. When you create an AI Evaluator in Humanloop, you will now create an AI Evaluator with a submit_judgment(judgment, reasoning) tool that takes judgment and reasoning as arguments. When you run this Evaluator on a Log, Humanloop will force the model to call the tool. The model will then return an appropriate judgment alongside its reasoning.

You can customize the AI Evaluator in its Editor tab. Here, Humanloop displays a “Parameters” and a “Template” section, similar to the Prompt Editor, allowing you to define the messages and parameters used to call the model. In the “Judgment” section below those, you can customize the function descriptions and disable the reasoning argument.

To test the AI Evaluator, you can load Logs from a Prompt with the Select a Prompt or Dataset button in the Debug console panel. After Logs are loaded, click the Run button to run the AI Evaluator on the Logs. The resulting judgments will be shown beside the Logs. If reasoning is enabled, you can view the reasoning by hovering over the judgment or by clicking the Open in drawer button next to the judgment.

AI Evaluator Editor with function calling

New models: Gemini 2.0 Flash, Llama 3.3 70B

December 12th, 2024

To support you in adopting the latest models, we’ve added support for more new models, including the latest experimental models for Gemini.

These include gemini-2.0-flash-exp with better performance than Gemini 1.5 Pro and tool use, and gemini-exp-1206, the latest experimental advanced model.

We’ve also added support for Llama 3.3 70B on Groq, Meta’s latest model with performance comparable to their largest Llama 3.1 405B model.

You can start using these models in your Prompts by going to the Editor and selecting the model from the dropdown. (To use the Gemini models, you need to have a Google API key saved in your Humanloop account settings.)

Gemini 2.0 Flash in Prompt Editor

Drag and Drop in the Sidebar

December 9th, 2024

You can now drag and drop files into the sidebar to organize your Prompts, Evaluators, Datasets, and Flows into Directories.

With this much requested feature, you can easily reorganize your workspace hierarchy without having to use the ‘Move…’ modals.

This improvement makes it easier to maintain a clean and organized workspace. We recommend using a Directory per project to group together related files.

Logs with user-defined IDs

December 6th, 2024

We’ve added the ability to create Logs with your own unique ID, which you can then use to reference the Log when making API calls to Humanloop.

1my_id = "my_very_own_and_unique_id"
2# create Log with "my_very_own_and_unique_id" id
3humanloop.prompts.call(
4 path="path_to_the_prompt",
5 prompt={
6 "model": "gpt-4",
7 "template": [
8 {
9 "role": "system",
10 "content": "You are a helpful assistant. Tell the truth, the whole truth, and nothing but the truth",
11 },
12 ],
13 },
14 log_id=my_id,
15 messages=[{"role": "user", "content": "Is it acceptable to put pineapples on pizza?"}],
16)
17# add evaluator judgment to this Log using your own id
18humanloop.evaluators.log(
19 parent_id=my_id,
20 path="path_to_my_evaluator",
21 judgment="good",
22 spec={
23 "arguments_type": "target_free",
24 "return_type": "select",
25 "evaluator_type": "human",
26 "options": [{"name": "bad", "valence": "negative"}, {"name": "good", "valence": "positive"}]
27 })

This is particularly useful for providing judgments on the Logs without requiring you to store Humanloop-generated IDs in your application.

Flow Trace in Review View

December 3rd, 2024

We’ve added the ability to see the full Flow trace directly in the Review view. This is useful to get the full context of what was called during the execution of a Flow.

To open the Log drawer side panel, click on the Log ID above the Log output in the Review view.

Built with