Filevine Uses Humanloop to Fast-Track AI Products, Doubling Annual Revenue

Filevine, a legal case management platform, launched six new AI products within a year, nearly doubling its ARR and earning recognition as one of the fastest growing private companies.

Brianna Connelly

VP of Data Science, Filevine

Humanloop has become our system of record for all GenAI prompt management. Now our legal domain experts can collaborate with our entire team of data scientists, PMs, and engineers to scale and iterate our rapidly growing GenAI product lines.

This has been a massive unlock for team productivity and product iteration, helping us scale our AI product SKUs and achieve significant revenue gains and customer satisfaction.

AI products released within 1 year

Revenue generated from AI product SKUs within 1 year

16 hrs

Average time saved per week on evaluation & prompt management

AI products released within 1 year

Revenue generated from AI product SKUs within 1 year

Average time saved per week on evaluation & prompt management

Problem

Manual, time-consuming prompt management

Scaling LLM Apps to Production

When Brianna Connelly, VP of Data Science, joined Filevine in May 2023 the company had just launched DemandsAI, its first AI product, which allows law firms to securely generate demand letters by leveraging LLMs, prompt engineering, and RAG. This was a first-of-its kind solution for the legal case management industry, which has been slow to adopt GenAI solution due to extra concerns around data privacy and accuracy.

The product launch was extremely successful, instantly boosting Filevine’s quarterly ARR, but it also created a problem – the team recognized an immediate opportunity to engage solutions that would allow their existing processes for prompt management and engineering to scale.

Due to the compliance-heavy nature of Filevine’s legal work platform, the company found itself caught between the need to adhere to the highest security standards, and the mission to be first to market in producing AI tools to enhance legal work.

“I'm convinced the vast majority of companies leveraging generative AI today are operating in the dark. Everyone's just hitting LLM APIs without a clear understanding of how they are functioning.”

Brianna Connelly

VP of Data Science

Moving from Manual to Holistic Prompt Management

Prompts were manually prototyped by domain experts and handed off to PMs and Senior Software Engineers to incorporate them into AI features and get them live into production. The AI team had no visibility into LLM performance logs and output once the product went live.

Additionally, the team was watching more and more LLMs get released to the public, and they wanted to experiment with leveraging different models for different tasks, but the onus of switching models and beginning the prototyping and evaluation process from scratch was too much overhead for the current system.

“Before Humanloop, our prompt management and evaluation process was extremely manual and time-consuming, done entirely on spreadsheets by legal domain experts. This created a significant bottleneck that slowed down our product roadmap and prevented us from adopting powerful new models.”

Brianna Connelly

VP of Data Science

Removing Manual Blockers From AI Product Development

The manual nature of Filevine’s prompt evaluation process drained human resources and slowed down team productivity. Brianna knew that without significant change to Filevine’s LLM ops backend, the AI product team wouldn’t be able to meet the ambitiousAI product roadmap and revenue demands the company had set.

Brianna discovered Humanloop while listening to an interview on the Latent Space Substack featuring Humanloop CEO Raza Habib. She immediately requested a demo.

“I'm convinced Latent Space has a microphone in my office,” said Brianna. “The pain points Raza described about how to evaluate LLMs beyond subjective human reviews especially resonated.”

Filevine's legal domain experts can now collaborate with their entire AI team of data scientists, PMs, and engineers to scale and iterate their rapidly growing GenAI product lines.

Solution

Creating a single-source-of-truth for prompt management and collaboration

Evaluating LLMs for performance and accuracy

Humanloop became Filevine's single-source-of-truth for all things prompt management and evaluations.

Once all data sources, models, prompts, and prompt outputs were fully connected, Filevine finally had a clear view of prompt performance and accuracy. This helped the Filevine team quickly triage and diagnose poorly performing prompts, quickly iterate and make changes to prompts based on customer feedback, and experiment with new LLMs and evaluate multiple models to find the best match for their product use case.

In some cases, when customizing the Filevine product for big enterprise deal POCs, Filevine leveraged Humanloop to customize prompts for individual clients, ensuring they experienced the highest level of product UX and customization that the AI features in their platform provided.

Creating a powerhouse AI team

Filevine’s team efficiency and collaboration also increased. By leveraging Humanloop, Filevine's legal domain experts no longer were bottlenecked in the evaluation process. By leveraging the user-friendly Humanloop web application, domain experts can easily evaluate prompt outputs and leave comments within the app, leaving the Filevine’s GenAI codebase intact for the PM and Engineering team to push changes to prod without needing to re-deploy their application. This lets them take their iteration cycle from 3 days down to 5 minutes.”

“With Humanloop, our legal domain experts can now collaborate with our entire AI team of data scientists, PMs, and engineers to scale and iterate our rapidly growing GenAI product lines. This collaboration has been a massive unlock for team productivity and product iteration.”

Alex McLaughlin

Vice President of Product at Filevine

Optimizing LLMs for Performance

Understanding primary source data for lawyers is critical. This need for data efficacy and high-fidelity prompt outputs helped Briana justify the investment in Humanloop. With Humanloop, the Filevine team unlocked new capabilities such as A/B testing for prompts and designed a more streamlined human review process by leveraging Humanloop’s evaluation tools to assess prompt performance both for objective metrics like latency and accuracy as well as more subjective metrics like tone.

Humanloop’s log management features help engineers quickly debug performance issues and take action to fix them. Humanloop’s reporting capabilities help Filevine’s AI team report performance and accuracy results to senior leadership within the company, helping them support production decisions.

Filevine had particularly stringent security concerns as they handle sensitive legal data. They were reassured by Humanloop’s SOC2 compliance and opted for Virtual Private Cloud (VPC) deployment. The whole Humanloop application is deployed in a private AWS fargate instance where Filevine controls the encryption keys. This setup ensures that their log data is isolated and protected within a dedicated virtual network, offering enhanced security, control, and compliance with industry standards.

Result

Unprecedented revenue gains and customer satisfaction

Increasing AI Product Release Velocity

While working with Humanloop and independently investing in our own proprietary AI infrastructure and development processes, Filevine has experienced record-breaking growth among their new AI product SKUs while saving critical time on product development cycles internally.

We anticipate continual ARR growth in AI thanks to faster development time via Humanloop. It’s very exciting. Everybody believes in this mission and we have the ARR and product adoption from our AI product lines to prove that.”

Brianna Connelly

VP of Data Science

In the near future, Brianna and the Filevine AI product team look forward to leveraging Humanloop to advance additional RAG and vector search projects, evaluate more open source LLMs like LlamaIndex and Mistral, and exploring new fine-tuning use cases increase LLM performance.

To learn more about Humanloop, check out our docs and request a demo today.

1m+

Legal documents processed per day

Models used

Google Gemini

Google PaLM

Anthropic Claude

LLM Use Cases

Data Ingest & Extraction

OCR Image Recognition

RAG

Text Summarization