Instrument and monitor multi-step AI systems.

Introduction

LLM-powered systems are multi-step processes, leveraging information sources, delegating computation to tools, or iterating with the LLM to generate the final answer.

Looking at the inputs and output of such a system is not enough to reason about its behavior. Flows address this by tracing all components of the feature, unifying Logs in a comprehensive view of the system.

Basics

To integrate Flows, start by creating a logging trace.

Question answering agent
1@humanloop.flow(
2 path="QA Agent/Call Agent",
3 attributes={"version": "v1", "wikipedia": True}
4)
5def call_agent(query: str) -> str:
6 """A simple answer questioning agent."""
7
8 # Agent logic goes here
9 # ...
10
11 return answer

Using the Humanloop SDK, we are capturing the inputs and output of the agent on Humanloop, and we can already begin evaluating the system’s performance through code or through the platform UI.

Additional Logs can be added to the trace, providing additional context when investigating.

Tracing

When creating a Log, use the trace_parent_id attribute to link it to another Log in a trace, forming a chain of related actions.

Question answering agent
1@humanloop.tool(path="QA Agent/Search Wikipedia")
2def search_wikipedia(query: str) -> dict:
3 """LLM function calls this to search Wikipedia."""
4 ...
5
6
7@humanloop.prompt(path="QA Agent/Call Model")
8def call_model(messages: list[dict]) -> dict:
9 """Interact with the LLM model."""
10 ...
11
12
13@humanloop.flow(
14 path="QA Agent/Call Agent",
15 attributes={"version": "v1", "wikipedia": True}
16)
17def call_agent(query: str) -> str:
18 """A simple answer questioning agent."""
19 ...

In the scenario above, call_agent calls call_model multiple times to refine responses. Through function calls to search_wikipedia, the LLM queries an external source to provide factual answers.

Calling the other functions inside call_agent creates Logs and adds them to the trace created by call_agent.

Tracing the agent allows us to see the steps taken by the model to produce the answer.

Versioning

Flow versioning is managed through the attributes field, which acts as your feature’s manifest.

Question answering agent
1@humanloop.flow(
2 path="QA Agent/Call Agent",
3 attributes={"version": "v1", "wikipedia": True}
4)
5def call_agent(query: str) -> str:
6 """A simple answer questioning agent."""
7 ...

Observability

You must mark a trace as complete once all relevant Logs have been added. This triggers the monitoring Evaluators to evaluate the Log.

Evaluators can access nested Logs via the children attribute, allowing you to evaluate individual steps or the entire workflow.

Count Logs in a trace
1def count_logs_evaluator(log):
2 """Count the number of Logs in a trace."""
3 if log["children"]:
4 return 1 + sum([count_logs(child) for child in log["children"]])
5 return 1

Evaluation

Unlike Prompts, which can be evaluated on the Humanloop runtime, you have to evaluate Flows on your side through code.

To do this, provide a callable argument to the evaluate method.

Evaluating a Flow
1humanloop.evaluations.run(
2 name="Comprehensiveness Evaluation",
3 file={
4 "path": "QA Agent/Call Agent",
5 "callable": call_agent,
6 },
7 evaluators=[
8 {"path": "QA Agent/Answer Comprehensiveness"},
9 ],
10 dataset={"path": "QA Agent/Simple Answers"},
11)

Flow-level metrics

Flow Logs calculate their start and end times based on the earliest and latest Logs in the trace.

Log metrics such as cost and token usage aggregate values from all Logs within the trace.

Next steps

You now understand the role of Flows in the Humanloop ecosystem. Explore the following resources to apply Flows to your AI project:

  • Check out our logging quickstart for an example project instrumented with Flows.

  • Dive into the evals guide to learn how to evaluate your AI project.

Built with