Humanloop is joining AnthropicView the announcement

Gusto Uses Humanloop to Scale AI-Powered Customer Support and Workflow Automation

Gusto, a leading payroll and HR platform, has used Humanloop to dramatically improve the accuracy, efficiency, and scalability of its AI-powered customer support and its customer-facing AI agent, Gus. By integrating Humanloop into its AI development workflow, Gusto has tripled its AI deflection rates and saved millions in support costs while also enhancing its AI assistant, Gus, to provide a more intuitive and powerful experience for customers.

Edward Kim

Founder and Chief Architect, Gusto

Humanloop is our eval platform. You wouldn’t write code without writing unit tests, and you shouldn’t deploy AI models without evaluations. Humanloop ensures we only ship models that meet our quality standards.

Improvement in AI app accuracy

$MMs

Saved in costs

100+

Employees collaborating on AI

Improvement in AI app accuracy

Saved in costs

Employees collaborating on AI

Problem

Building Reliable AI-Powered Support and Reporting

As Gusto expanded its AI initiatives, the company was committed to in ensuring the accuracy and reliability of its AI-powered assistant, Gus.

Gus was designed to assist customers with payroll and HR-related questions, as well as help generate complex reports. However, when Gusto initially tested Gus internally, it needed to make certain of:

Accuracy: Ensuring AI pulled correct data, leading to consistent results
Systematic Evaluation: A structured way to measure if changes were improving performance, without manually tweaking prompt.
Workflow Efficiency: Accelerating the speed of iteration by ensuring domain experts were responsible for refining AI responses.

“Without a systematic evaluation process, it's not possible to know if changes will lead to regressions. Humanloop gives us confidence to iterate more quickly”

Edward Kim

Founder and Chief Architect

Solution

Humanloop as Gusto’s AI Evaluation System

Gusto integrated Humanloop as its AI evaluation system, creating a structured process to ensure continuous improvement. Key implementations included:

1. Automated Evaluations in CI/CD

Gusto built a pipeline where every AI model or prompt update runs against a dataset of real-world queries stored in Humanloop.
Humanloop's API is integrated into Gusto’s CI/CD system, automatically evaluating accuracy and blocking deployment if quality thresholds aren’t met.

“Anytime an engineer submits a pull request, Humanloop runs evals and posts the accuracy score directly in the PR. If it doesn’t meet our quality threshold, we block deployment.”

Edward Kim

Founder and Chief Architect

2. Enhancing Gus: The AI Assistant

Improved Conversational AI: Gus can now handle many tasks that were previously only available through Gusto’s web interface.
Better Accuracy: Using Humanloop’s evaluation system, Gus is continuously refined based on real-world interactions.
Faster Response Times: Customers get near-instant answers to HR and payroll-related questions, significantly improving satisfaction.

Gus: the AI-powered assistant helping small business owners save time, get personalized insights, and make smarter decisions about their business

“One of Gus’s most impactful use cases is reporting. Previously, generating and analyzing complex reports took hours. Now, Gus handles it in seconds, analyzing data across multiple client accounts.”

Edward Kim

Founder and Chief Architect

3. Empowering Domain Experts

Instead of engineers tweaking AI prompts, Gusto’s customer support team can now refine Gus’s responses using an internal tool called Gus Studio.
These experts review AI interactions, promote logs to training datasets in Humanloop, and iteratively improve AI accuracy.

“Our CX team now owns their own set of problems. They review real interactions, add cases to Humanloop, and see immediate impact on accuracy.”

Edward Kim

Founder and Chief Architect

4. Debugging and Continuous Monitoring 24/7

Humanloop’s Flows feature allows Gusto to analyze AI conversations and identify areas of improvement.
A daily report surfaces interactions that need refinement, deep-linking into Humanloop for debugging.

“Every morning, I start my day reviewing interactions in Humanloop. It’s like debugging a conversation, knowing exactly what can be improved”

Edward Kim

Founder and Chief Architect

Results

Better AI Applications, Deployed Faster with Enhanced Visibility

3x Increase in AI-Powered Case Deflection

Before Humanloop: 10% of customer inquiries were resolved by AI.
After Humanloop: AI successfully resolves 30% of inquiries.
Projected Goal: Over 50% AI-driven resolution with continued prompt improvements.

Faster AI Model Deployment with Confidence

AI prompt changes now go through rigorous evaluation before deployment.
Support experts can refine AI performance without engineering bottlenecks.
Engineers focus on building new capabilities rather than manually fixing prompt failures.

Finding a solution

Why Gusto Chose Humanloop

Seamless CI/CD Integration: Fully embedded into Gusto’s deployment workflow.
Real-World Data for Evaluations: Uses actual customer interactions, not hypothetical test cases.
Scalability: Supports both engineering teams and non-technical domain experts.
Improved Debugging: Daily logs surface failure points for continuous improvement.

Customer success

Looking Ahead

Gusto plans to further expand its AI capabilities, leveraging Humanloop to:

Improve AI-driven report generation for Gusto users, which include accountants and small business owners.
Increase accuracy and personalization in customer interactions.
Enhance internal tooling for domain experts to fine-tune AI responses more effectively.

“I really want to create a culture at Gusto where anyone deploying AI models does it with evals built-in. You wouldn’t deploy code without tests, and AI should be no different.”

Edward Kim

Founder and Chief Architect

Read other customer stories

Filevine Uses Humanloop to Fast-Track AI Products, Doubling Annual Revenue

Filevine, a legal case management platform, launched six new AI products within a year, nearly doubling its ARR and earning recognition as one of the fastest growing private companies.

AI Products

Revenue Generated

How Dixa uses Humanloop to 3x its AI product velocity

Dixa is a conversational customer service platform that has seamlessly integrated AI into its product offerings. Since Dixa started working with Humanloop, they have been able to ship AI features 3x faster.

Product Velocity

100%

Performance Visibility

How FMG solves LLM evaluation with Humanloop

FMG is a marketing platform for financial advisors. Operating in a regulated industry makes AI applications higher risk. Thanks to Humanloop they can measure AI performance and ensure their applications are reliable.

93% cheaper

by fine tuning

100s

Eng. hours saved

Athena scales AI across its organization with Humanloop

Athena is the leading provider of outsourced Executive Assistants (EAs). They recognized the opportunity of AI to augment their EAs and use Humanloop to allow them to offer a superior service to thousands of companies and individuals.

600+

EAs on Humanloop

200+

Active Projects

Humanloop makes complex AI features possible for Twain

Twain is an AI communication assistant for sales professionals. With Humanloop, linguists collaborate with engineers and they ship features in weeks instead of months.

4.4x faster

time to market

20+

AI features