How FMG solves LLM evaluation with Humanloop

FMG is a marketing platform for financial advisors. Operating in a regulated industry makes AI applications higher risk. Thanks to Humanloop they can measure AI performance and ensure their applications are reliable.

The Problem

FMG didn't have an objective way to measure their AI product's performance

FMG was developing many different LLM-powered applications. For example, they were automating email responses for customer support and using AI to automate compliance checks for their customer's websites. They knew this was just the beginning and they'd be developing many more AI applications in the future. import React from 'react';

The problem they faced was that they didn't have a reliable way to measure the performance of their applications.

“I was driving AI adoption and one of my biggest concerns was how am going to know if this model, this use of AI is actually working or not?! ”

Lucas Jans

Product leader at FMG

FMG considered building an internal tool for prompt development and AI evaluation but as they mapped out all the features they would need they realised this was a complex product to build and maintain. Their key priority was a platform that allowed them to:

gather real-time feedback and monitor live performance
measure performance objectively during development and detect regressions upon changes
keep product leaders in the driving seat while facilitating seamless collaboration with engineers on AI feature development

The Solution

Humanloop's evaluation tools allowed FMG to confidently accelerate product development

Using Humanloop's tools the FMG team can take the guesswork out of their application development. They use Humanloop's evaluators and dataset features to get objective feedback on their AI applications as they develop them. This lets them compare different prompts and models as they iterate and make decisions on how to improve their systems.

For example, they were able to finetune smaller models and show they matched the performance of GPT-4. This allowed them to make a 15x cost saving.

FMG's team set up a suite of different evaluators in Humanloop, ranging from simple tests such as latency thresholds to more complex rule-based checks as well as explicit and implicit human feedback from their user-facing applications.

Humanloop's evaluation suite helps improve whatever metric you care about. Humanloop helped FMG reduce the cost of their AI products by fine-tuning smaller models.

The Result

Humanloop saves FMG hundreds of engineering hours

Were it not for Humanloop FMG say they would not have been able develop their AI features or get clients to trust in their AI applications. Tasks that would have taken hundreds of hours are now automatic.

“Humanloop has continued to build things that we didn't realise we would need that have significantly improved the way we build and operate our LLM apps. This is just the beginning, so we're excited about what's next.”

Lucas Jans

Director of Product FMG

Read other customer stories

How Dixa uses Humanloop to 3x its AI product velocity

Dixa is a conversational customer service platform that has seamlessly integrated AI into its product offerings. Since Dixa started working with Humanloop, they have been able to ship AI features 3x faster.

Product Velocity

100%

Performance Visibility

Filevine Uses Humanloop to Fast-Track AI Products, Doubling Annual Revenue

Filevine, a legal case management platform, launched six new AI products within a year, nearly doubling its ARR and earning recognition as one of the fastest growing private companies.

AI Products

Revenue Generated

Athena scales AI across its organization with Humanloop

Athena is the leading provider of outsourced Executive Assistants (EAs). They recognized the opportunity of AI to augment their EAs and use Humanloop to allow them to offer a superior service to thousands of companies and individuals.

600+

EAs on Humanloop

200+

Active Projects

Humanloop makes complex AI features possible for Twain

Twain is an AI communication assistant for sales professionals. With Humanloop, linguists collaborate with engineers and they ship features in weeks instead of months.

4.4x faster

time to market

20+

AI features