Humanloop is joining AnthropicView the announcement
Gusto

Gusto Uses Humanloop to Scale AI-Powered Customer Support and Workflow Automation

Gusto, a leading payroll and HR platform, has used Humanloop to dramatically improve the accuracy, efficiency, and scalability of its AI-powered customer support and its customer-facing AI agent, Gus. By integrating Humanloop into its AI development workflow, Gusto has tripled its AI deflection rates and saved millions in support costs while also enhancing its AI assistant, Gus, to provide a more intuitive and powerful experience for customers.

Gusto
Edward Kim

Edward Kim

Founder and Chief Architect, Gusto

Humanloop is our eval platform. You wouldn’t write code without writing unit tests, and you shouldn’t deploy AI models without evaluations. Humanloop ensures we only ship models that meet our quality standards.

3x

Improvement in AI app accuracy

$MMs

Saved in costs

100+

Employees collaborating on AI

Improvement in AI app accuracy

Saved in costs

Employees collaborating on AI

Problem

Building Reliable AI-Powered Support and Reporting

As Gusto expanded its AI initiatives, the company was committed to in ensuring the accuracy and reliability of its AI-powered assistant, Gus.

Gus was designed to assist customers with payroll and HR-related questions, as well as help generate complex reports. However, when Gusto initially tested Gus internally, it needed to make certain of:

  • Accuracy: Ensuring AI pulled correct data, leading to consistent results
  • Systematic Evaluation: A structured way to measure if changes were improving performance, without manually tweaking prompt.
  • Workflow Efficiency: Accelerating the speed of iteration by ensuring domain experts were responsible for refining AI responses.
“Without a systematic evaluation process, it's not possible to know if changes will lead to regressions. Humanloop gives us confidence to iterate more quickly”
Edward Kim
Edward Kim
Founder and Chief Architect
Solution

Humanloop as Gusto’s AI Evaluation System

Gusto integrated Humanloop as its AI evaluation system, creating a structured process to ensure continuous improvement. Key implementations included:

1. Automated Evaluations in CI/CD

  • Gusto built a pipeline where every AI model or prompt update runs against a dataset of real-world queries stored in Humanloop.
  • Humanloop's API is integrated into Gusto’s CI/CD system, automatically evaluating accuracy and blocking deployment if quality thresholds aren’t met.
“Anytime an engineer submits a pull request, Humanloop runs evals and posts the accuracy score directly in the PR. If it doesn’t meet our quality threshold, we block deployment.”
Edward Kim
Founder and Chief Architect

2. Enhancing Gus: The AI Assistant

  • Improved Conversational AI: Gus can now handle many tasks that were previously only available through Gusto’s web interface.
  • Better Accuracy: Using Humanloop’s evaluation system, Gus is continuously refined based on real-world interactions.
  • Faster Response Times: Customers get near-instant answers to HR and payroll-related questions, significantly improving satisfaction.
Gus: the AI-powered assistant helping small business owners save time, get personalized insights, and make smarter decisions about their business
Gus: the AI-powered assistant helping small business owners save time, get personalized insights, and make smarter decisions about their business

“One of Gus’s most impactful use cases is reporting. Previously, generating and analyzing complex reports took hours. Now, Gus handles it in seconds, analyzing data across multiple client accounts.”
Edward Kim
Founder and Chief Architect

3. Empowering Domain Experts

  • Instead of engineers tweaking AI prompts, Gusto’s customer support team can now refine Gus’s responses using an internal tool called Gus Studio.
  • These experts review AI interactions, promote logs to training datasets in Humanloop, and iteratively improve AI accuracy.
“Our CX team now owns their own set of problems. They review real interactions, add cases to Humanloop, and see immediate impact on accuracy.”
Edward Kim
Founder and Chief Architect

4. Debugging and Continuous Monitoring 24/7

  • Humanloop’s Flows feature allows Gusto to analyze AI conversations and identify areas of improvement.
  • A daily report surfaces interactions that need refinement, deep-linking into Humanloop for debugging.
“Every morning, I start my day reviewing interactions in Humanloop. It’s like debugging a conversation, knowing exactly what can be improved”
Edward Kim
Founder and Chief Architect
Results

Better AI Applications, Deployed Faster with Enhanced Visibility

3x Increase in AI-Powered Case Deflection

  • Before Humanloop: 10% of customer inquiries were resolved by AI.
  • After Humanloop: AI successfully resolves 30% of inquiries.
  • Projected Goal: Over 50% AI-driven resolution with continued prompt improvements.

Faster AI Model Deployment with Confidence

  • AI prompt changes now go through rigorous evaluation before deployment.
  • Support experts can refine AI performance without engineering bottlenecks.
  • Engineers focus on building new capabilities rather than manually fixing prompt failures.
Finding a solution

Why Gusto Chose Humanloop

  • Seamless CI/CD Integration: Fully embedded into Gusto’s deployment workflow.
  • Real-World Data for Evaluations: Uses actual customer interactions, not hypothetical test cases.
  • Scalability: Supports both engineering teams and non-technical domain experts.
  • Improved Debugging: Daily logs surface failure points for continuous improvement.
Customer success

Looking Ahead

Gusto plans to further expand its AI capabilities, leveraging Humanloop to:

  • Improve AI-driven report generation for Gusto users, which include accountants and small business owners.
  • Increase accuracy and personalization in customer interactions.
  • Enhance internal tooling for domain experts to fine-tune AI responses more effectively.
“I really want to create a culture at Gusto where anyone deploying AI models does it with evals built-in. You wouldn’t deploy code without tests, and AI should be no different.”
Edward Kim
Edward Kim
Founder and Chief Architect
Read other customer stories

Ready to build successful AI products?

Book a 1:1 demo for a guided tour of the platform tailored to your organization.

© 2020 - 2045 Humanloop, Inc.
HIPAAHIPAA