Set up LLM as a Judge
In this guide, we will set up an LLM evaluator to check for PII (Personally Identifiable Information) in Logs.
In this guide, we will set up an LLM evaluator to check for PII (Personally Identifiable Information) in Logs.
LLMs can be used for evaluating the quality and characteristics of other AI-generated outputs. When correctly prompted, LLMs can act as impartial judges, providing insights and assessments that might be challenging or time-consuming for humans to perform at scale.
In this guide, we’ll explore how to setup an LLM as an AI Evaluator in Humanloop, demonstrating their effectiveness in assessing various aspects of AI-generated content, such as checking for the presence of Personally Identifiable Information (PII).
An AI Evaluator is a Prompt that takes attributes from a generated Log (and optionally from a testcase Datapoint if comparing to expected results) as context and returns a judgement. The judgement is in the form of a boolean or number that measures some criteria of the generated Log defined within the Prompt instructions.
You should have an existing Prompt to evaluate and already generated some Logs. Follow our guide on creating a Prompt.
In this example we will use a simple Support Agent Prompt that answers user queries about Humanloop’s product and docs.
Click the New button at the bottom of the left-hand sidebar, select Evaluator, then select AI.
Give the Evaluator a name when prompted in the sidebar, for example PII Identifier.
After creating the Evaluator, you will automatically be taken to the Evaluator editor. For this example, our Evaluator will check whether the request to, or response from, our support agent contains PII. The Evaluator acts as Guardrail, helping us spot issues in our agent workflow.
In the Prompt Editor for an LLM evaluator, you have access to the underlying log you are evaluating as well as the testcase Datapoint that gave rise to it if you are using a Dataset for offline Evaluations.
These are accessed with the standard {{ variable }} syntax, enhanced with a familiar dot notation to pick out specific values from inside the log and testcase objects.
For example, suppose you are evaluating a Log object like this.
In the LLM Evaluator Prompt, {{ log.inputs.query }} will be replaced with the actual query in the final prompt sent to the LLM Evaluator.
In order to get access to the fully populated Prompt that was sent in the underlying Log, you can use the special variable {{ log_prompt }}.

