Brianna Connelly
VP of Data Science, Filevine
Humanloop has become our system of record for all GenAI prompt management. Now our legal domain experts can collaborate with our entire team of data scientists, PMs, and engineers to scale and iterate our rapidly growing GenAI product lines.
This has been a massive unlock for team productivity and product iteration, helping us scale our AI product SKUs and achieve significant revenue gains and customer satisfaction.
6
AI products released within 1 year
2x
Revenue generated from AI product SKUs within 1 year
16 hrs
Average time saved per week on evaluation & prompt management
AI products released within 1 year
Revenue generated from AI product SKUs within 1 year
Average time saved per week on evaluation & prompt management
Manual, time-consuming prompt management
Scaling LLM Apps to Production
When Brianna Connelly, VP of Data Science, joined Filevine in May 2023 the company had just launched DemandsAI, its first AI product, which allows law firms to securely generate demand letters by leveraging LLMs, prompt engineering, and RAG. This was a first-of-its kind solution for the legal case management industry, which has been slow to adopt GenAI solution due to extra concerns around data privacy and accuracy.
The product launch was extremely successful, instantly boosting Filevine’s quarterly ARR, but it also created a problem – the team recognized an immediate opportunity to engage solutions that would allow their existing processes for prompt management and engineering to scale.
Due to the compliance-heavy nature of Filevine’s legal work platform, the company found itself caught between the need to adhere to the highest security standards, and the mission to be first to market in producing AI tools to enhance legal work.
Moving from Manual to Holistic Prompt Management
Prompts were manually prototyped by domain experts and handed off to PMs and Senior Software Engineers to incorporate them into AI features and get them live into production. The AI team had no visibility into LLM performance logs and output once the product went live.
Additionally, the team was watching more and more LLMs get released to the public, and they wanted to experiment with leveraging different models for different tasks, but the onus of switching models and beginning the prototyping and evaluation process from scratch was too much overhead for the current system.
Removing Manual Blockers From AI Product Development
The manual nature of Filevine’s prompt evaluation process drained human resources and slowed down team productivity. Brianna knew that without significant change to Filevine’s LLM ops backend, the AI product team wouldn’t be able to meet the ambitiousAI product roadmap and revenue demands the company had set.
Brianna discovered Humanloop while listening to an interview on the Latent Space Substack featuring Humanloop CEO Raza Habib. She immediately requested a demo.
“I'm convinced Latent Space has a microphone in my office,” said Brianna. “The pain points Raza described about how to evaluate LLMs beyond subjective human reviews especially resonated.”
Creating a single-source-of-truth for prompt management and collaboration
Evaluating LLMs for performance and accuracy
It took 3 months to fully implement Humanloop into Filevine's environment, making the Humanloop platform Filevine's new single-source-of-truth for all things prompt management and evaluations.
Once all data sources, models, prompts, and prompt outputs were fully connected, Filevine finally had a clear view of prompt performance and accuracy. This helped the Filevine team quickly triage and diagnose poorly performing prompts, quickly iterate and make changes to prompts based on customer feedback, and experiment with new LLMs and evaluate multiple models to find the best match for their product use case.
In some cases, when customizing the Filevine product for big enterprise deal POCs, Filevine leveraged Humanloop to customize prompts for individual clients, ensuring they experienced the highest level of product UX and customization that the AI features in their platform provided.
Creating a powerhouse AI team
Filevine’s team efficiency and collaboration also increased. By leveraging Humanloop, Filevine's legal domain experts no longer were bottlenecked in the evaluation process. By leveraging the user-friendly Humanloop web application, domain experts can easily evaluate prompt outputs and leave comments within the app, leaving the Filevine’s GenAI codebase intact for the PM and Engineering team to push changes to prod without needing to re-deploy their application. This lets them take their iteration cycle from 3 days down to 5 minutes.”
Optimizing LLMs for Performance
Understanding primary source data for lawyers is critical. This need for data efficacy and high-fidelity prompt outputs helped Briana justify the investment in Humanloop. With Humanloop, the Filevine team unlocked new capabilities such as A/B testing for prompts and designed a more streamlined human review process by leveraging Humanloop’s evaluation tools to assess prompt performance both for objective metrics like latency and accuracy as well as more subjective metrics like tone.
Humanloop’s log management features help engineers quickly debug performance issues and take action to fix them. Humanloop’s reporting capabilities help Filevine’s AI team report performance and accuracy results to senior leadership within the company, helping them support production decisions.
Filevine had particularly stringent security concerns as they handle sensitive legal data. They were reassured by Humanloop’s SOC2 compliance and opted for Virtual Private Cloud (VPC) deployment. The whole Humanloop application is deployed in a private AWS fargate instance where Filevine controls the encryption keys. This setup ensures that their log data is isolated and protected within a dedicated virtual network, offering enhanced security, control, and compliance with industry standards.
Unprecedented revenue gains and customer satisfaction
Increasing AI Product Release Velocity
While working with Humanloop and independently investing in our own proprietary AI infrastructure and development processes, Filevine has experienced record-breaking growth among their new AI product SKUs while saving critical time on product development cycles internally.
Now in production: multi-modal AI tools for faster legal review
In the near future, Brianna and the Filevine AI product team look forward to leveraging Humanloop to advance additional RAG and vector search projects, evaluate more open source LLMs like LlamaIndex and Mistral, and exploring new fine-tuning use cases increase LLM performance.
To learn more about Humanloop, check out our docs and request a demo today.
1m+
Legal documents processed per day
Models used
Google Gemini
Google PaLM
Anthropic Claude
LLM Use Cases
Data Ingest & Extraction
OCR Image Recognition
RAG
Text Summarization
Field Population
LLM Self-Evaluation
Retrieval