Structured Outputs: Everything You Should Know

By Conor KellyGrowth

How can you ensure your large language models (LLMs) consistently produce outputs in the correct format - especially when performing tasks or calling APIs? What can you do to prevent your LLM from generating unpredictable or incomplete responses that can break your application?

LLMs often produce unstructured outputs that require extensive post-processing before they can be used effectively. This unpredictability leads to errors, wasted time, and increased costs.

OpenAI and Google introduced structured outputs to solve this problem. Structured outputs ensure model responses follow a strict format, reduce errors and make it easier to integrate LLMs into applications requiring consistent, machine-readable data.

This guide will explain how structured outputs work, how they can be implemented in OpenAI and Gemini models, the benefits they offer, and what challenges you could face when using them.

What are Structured Outputs?

Structured outputs ensure model-generated responses follow pre-defined formats, such as JSON, XML or Markdown. Previously, LLMs would generate responses in free-form text that doesn’t have a specific structure. Instead, structured outputs provide an alternative where outputs are machine readable, consistent, and can be easily integrated into other systems.

Free-form text can oftentimes be ambiguous and difficult to parse, whereas structured outputs allow for direct integration with software systems without needing complex data transformations.

Constraining the output to a specific format means structured outputs help reduce variability and errors. Therefore, the outputs are more reliable for tasks where consistency and accuracy are key, like API interactions or database updates.

OpenAI introduced Structured Outputs as a wider-scoped and vastly improved version of JSON mode, which was released in 2023 to ensure models would output JSON once instructed. The problem with JSON mode being that it doesn’t consistently output the correct schema. Structured outputs enforce a schema, making integrations with APIs and other tools more reliable, and reducing the chance that the model generates incorrect or irrelevant content.

According to OpenAI, getting LLMs to respond in a specific format via prompt engineering was around 35.9% reliable before structured outputs. Now, it’s 100% reliable (if strict is set to true).

Structured outputs lead to 100% consistency in JSON schema formatting. Source: OpenAI

How do Structured Outputs Work?

Typically, LLMs generate text token by token based on probabilistic predictions. Although, this method wouldn’t be suitable if the text needs to be generated in a specific format. Structured outputs guide the process with predefined rules or schemas, so each token adheres to the required structure. To monitor and control the sequence of token generation, techniques like Finite State Machine (FSM) are commonly used.

Overview of how LLMs can return structured outputs that conform to a specific schema. Source: Langchain

To leverage Structured Outputs using model providers like OpenAI and Gemini, you need to do the following:

Define a JSON Schema: A JSON provides a standardized format for defining the structure, and data types (e.g., strings, numbers, arrays) expected in a JSON document.
Incorporating the schema in API requests: During the API request, the JSON schema is included in the request configuration. By doing so, the model is instructed to generate outputs that match the specified schema.
Generating structured data: Once the request is processed, the LLM generates an output that fits within the constraints of the defined schema. Thus, each response is consistent and follows the expected format.

The following guides detail how you can set this up in OpenAI, Geminin and Humanloop.

How to use Structured Outputs on OpenAI

With the introduction of OpenAI structured outputs, you can generate structured data for use cases like dynamically generating user interfaces.

Although, it’s worth noting structured outputs are only available with OpenAI’s latest LLMs, which currently include:

gpt-4o-mini-2024-07-18 and later
gpt-4o-2024-08-06 and later

With that being said, here’s how you can start using the gpt-4o structured output and gpt-4o mini structured output feature using response_format and SDK helper:

1. Define the Object

Firstly, define the object or data structure that represents the JSON Schema that the model should follow. This ensures the model adheres to a specific structure, such as a list of steps.

An example of how you might do this using Pydantic:

from pydantic import BaseModel

class Step(BaseModel):
    explanation: str
    output: str

class MathResponse(BaseModel):
    steps: list[Step]
    final_answer: str

The ‘step’ class defines each step in a math problem with two fields:

‘explanation’: A string explaining the step ‘output’: The result at that step

The ‘MathResponse’ defines the overall response, which includes (‘steps’) and a ‘final_answer’.

2. Incorporate the Object in the API Call

Once you’ve defined your object, you can supply it in the API request using the ‘response_format’ parameter. The SDK will automatically parse the model’s output into the object you defined.

completion = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[
        {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
        {"role": "user", "content": "how can I solve 8x + 7 = -23"}
    ],
    response_format=MathResponse
  )

Here, OpenAI’s GPT-4o is being used to solve a math problem. The ‘response_format’ ensures the model’s output conforms to the ‘MathResponse’ schema. Additionally, the SDK handles parsing the response into an object that matches your schema.

3. Handle Edge Cases

Occasionally, the model might not generate a valid response that matches the provided JSON Schema.

This can happen if:

The model refuses to answer because of safety reasons
The response is incomplete due to reaching a token limit

An example of how you can handle edge cases:

try:
    completion = client.beta.chat.completions.parse(
        model="gpt-4o-2024-08-06",
        messages=[
            {"role": "system", "content": "You are a helpful math tutor. Guide the user through the solution step by step."},
            {"role": "user", "content": "how can I solve 8x + 7 = -23"}
        ],
        response_format=MathResponse,
        max_tokens=50
    )
    math_response = completion.choices[0].message
    if math_response.parsed:
        print(math_response.parsed)
    elif math_response.refusal:
        # handle refusal
        print(math_response.refusal)
except Exception as e:
    # Handle edge cases
    if type(e) == openai.LengthFinishReasonError:
        # Retry with a higher max tokens
        print("Too many tokens: ", e)
        pass
    else:
        # Handle other exceptions
        print(e)
        pass

The code checks for different edge cases, such as incomplete responses due to token limits or safety reasons. If a refusal happens, it prints an explanation. Otherwise, it processes and prints the structured output.

4. Use Structured Data in a Type-Safe Way

When using structured outputs, you can access the parsed JSON response as an object of the type you defined in ‘response_format’. This ensures type safety and lets you directly work with structured data.

Example:

math_response = completion.choices[0].message.parsed
print(math_response.steps)
print(math_response.final_answer)

How to use Structured Outputs on Gemini

The text generated by Gemini is unstructured by default, so you’ll want to look into using Gemini’s structured output functionality.

Here’s how you can use them on the Gemini API:

1. Set Up Your Project and API Key

Before you start, ensure your project is set up and the API key is configured. This is necessary to authenticate your requests to the Gemini API. You’ll need to define a JSON schema that specifies the structure and data types for the output.

2. Define Your JSON Schema and supply the Schema to the Model

You’ll need to define a JSON schema that specifies the structure and data types for the output.

Afterwards, you’ll want to supply it to the model. There are two ways to do so:

As a text in the prompt: You can include a description of your desired JSON format directly in the prompt.

model = genai.GenerativeModel("gemini-1.5-pro-latest")
prompt = """List a few popular cookie recipes in JSON format.

Use this JSON schema:

Recipe = {'recipe_name': str, 'ingredients': list[str]}
Return: list[Recipe]"""
result = model.generate_content(prompt)
print(result)

Through model configuration: This is a more formal method where you configure the model with a specific schema using ‘response_schema’.

import typing_extensions as typing

class Recipe(typing.TypedDict):
    recipe_name: str
    ingredients: list[str]

model = genai.GenerativeModel("gemini-1.5-pro-latest")
result = model.generate_content(
    "List a few popular cookie recipes.",
    generation_config=genai.GenerationConfig(
        response_mime_type="application/json", response_schema=list[Recipe]
    ),
)
print(result)

3. Use Enums for Constrained Outputs

You may need to constrain outputs to specific options, which is useful for applications like classification tasks. In this case, you’ll need to use enums in your schemas.

import google.generativeai as genai
import enum

class Choice(enum.Enum):
    PERCUSSION = "Percussion"
    STRING = "String"
    WOODWIND = "Woodwind"
    BRASS = "Brass"
    KEYBOARD = "Keyboard"

model = genai.GenerativeModel("gemini-1.5-pro-latest")
result = model.generate_content(
    ["What kind of instrument is this?", "organ.jpg"],
    generation_config=genai.GenerationConfig(
        response_mime_type="text/x.enum", response_schema=Choice
    ),
)
print(result)  # Output will be one of the enum options

How to use Structured Outputs in Humanloop

Prompt management on Humanloop streamlines the prompt development process, making it easier to version, evaluate and collaborate on prompts.

In Humanloop, you can use structured outputs by:

Creating or selecting a Prompt
Open the Editor tab
Select the Response Format dropdown
Add a JSON schema (manually or using the AI-powered JSON schema generator)

You can then deploy the Humanloop API to your app or agent to leverage structured outputs

Structured outputs in the Humanloop prompt editor. Source: Humanloop

Benefits of Structured Outputs

A few benefits of using structured outputs include:

Reduced hallucinations: By enforcing adherence to a schema, structured outputs play a key role. Adhering to a schema means unexpected data is less likely to appear, so that only relevant information is included in the output. This makes it easier to evaluate LLM applications, as structured outputs provide predictable and verifiable data formats.
Seamless integration: Ensuring model outputs consistently match a predefined schema means that you can simplify the integration process with systems. This is especially useful for applications that require structured data formats like databases or APIs, where consistency is useful for smooth operation.
Reduced variability: Structured outputs limit the model’s ability to deviate from the specified format, leading to reduced variability. It also makes validation easier because the output is guaranteed to match the schema. As a result, you may not need to use complex post-processing logic, since you can rely on the schema to ensure all the required fields are present and correctly formatted.

Challenges of Structured Outputs

Despite having various benefits, there’s some challenges you’ll need to overcome when working with structured outputs:

Schema design complexities: You might find yourself working with schemas that support complex, nested structures. Designing an effective JSON schema can be a time-consuming and complex process. One example is if you’re extracting structured information from legal documents or multi-step processes, you might need to define deeply nested schemas that capture all the necessary details without introducing errors.
Capped outputs: There is a token limit of 16384, meaning that any larger outputs will result in an invalid JSON. Consequently, this leads to issues when parsing or using the data in downstream formats. For example, if you were generating a list of objects like transaction details and the list is too long, only part of it would be returned before hitting the token limit.
Reduced reasoning capabilities: Although structured outputs help reduce hallucinations, research has shown there might be a decline in the LLMs reasoning capabilities compared to using free-form responses. This can be seen in the example below.

Using structured outputs may result in reducing reasoning capabilities due to restrictions. Source: Rui Tam et al.

Learn More About Structured Outputs

Structured outputs offer a powerful solution to ensure your LLMs generate reliable, predictable, and machine-readable responses. Whether you’re working with OpenAI or Gemini models, structured outputs help you maintain consistent data formats across applications, making it easier to manage complex workflows and extract actionable insights.

Humanloop helps enterprises follow best practices to implement structured outputs for use cases like information extraction and data validation.

To find out more about how you can use structured outputs for reduced hallucinations in your LLM responses, book a demo today.