How Replicate is Democratizing AI with Open-Source Resources

Raza Habib

Replicate is making AI accessible for all software developers through its API-based model library. In this conversation, we explored Replicate’s journey and how the platform empowers both hobbyists and large enterprises to use AI.

Subscribe to Humanloop’s new podcast, High Agency, on YouTube, Spotify, or Apple Podcasts

I recently sat down with Ben Firshman from Replicate who revealed their mission to enable engineers without a deep AI background to work with advanced machine learning models. At the core of Replicate is its expansive model library, which houses thousands of open-source and proprietary models ready to use with a simple API call. Today, Replicate enables millions of developers deploy applications without the infrastructure and maintenance hurdles traditionally associated with AI.

Some takeaways from the discussion:

1. Run models with one line of code

Replicate began as a solution for engineers frustrated with the difficulties of deploying machine learning models. Drawing from his experience at Docker, Ben realized that containerization could apply to machine learning, creating a standard for deploying models much like Docker simplified software deployment. With Replicate's open-source Cog technology, developers can package models into containers, bringing the "write once, deploy anywhere" philosophy to AI. This approach allows engineers to run and customize AI models without in-depth ML knowledge, making it easier for businesses to adopt AI solutions.

"Software engineers can take these models and run them with one line of code, without having to understand all the internals about how the model works, and without having to set up GPUs"

2. Iterate and refine AI solutions through real-world applications

Ben emphasizes that the best way to learn and improve AI tools is by using them. Many of Replicate's users start with a model and adapt it over time as they explore its capabilities and limitations. Through this iterative process, developers uncover new ways to refine model performance and customize it to meet user needs.

"My best advice is to just try these things and see how they behave at a high level. Just use them and understand how prompting works, understand the differences between all the different models and how they behave, how you can plug a bunch of models together to produce interesting higher-order behavior. This is as opposed to learning all of theory, which I think is worth knowing, but it's not going to help you build a great product."

Chapters

00:00 Introduction
00:29 Overview of Replicate
03:13 Replicate's user base
05:45 Enterprise use cases and lowering the AI barrier
07:45 The complexity of traditional AI deployment
10:24 Simplifying AI with Replicate's API
13:50 ControlNets and the challenges of image models
19:42 Fragmentation in AI models: images vs. language
25:05 Customization and multi-model pipelines in production
26:33 Learning by doing: skills for AI engineers
28:44 Applying AI in governments
31:12 Iterative development and co-evolution of AI specs
33:13 Final reflections on AI hype
35:18 Conclusion

Podcast:

[00:00:00] Introduction

Ben Firshman: In some ways, I think it is simultaneously both over-hyped and under- hyped. I think both suffer from a lack of really understanding what these systems are and how they work. It could be anything of arbitrary detail - you know, a cat with a hat, smoking a cigar, sitting on a mat, and the mat is a magic carpet flying throughout space. I think something that the UK Government did so well is that it was just repeated again and again: start with user needs.

[00:00:29] Overview of Replicate

Raza Habib: This is High Agency, the podcast for AI builders. I'm Raza Habib. I'm delighted today to be joined by Ben Firshman, who's the CEO and one of the co-founders of Replicate, or Replicate.ai. Replicate is in the sweet spot of the kind of companies we want to speak to at Human Loop, because their core mission is to try and enable non-machine learning engineers and AI engineers broadly to be able to build with AI. So Ben, thanks so much for coming on the show.

Ben Firshman: Thanks for inviting me.

Raza Habib: So Ben, to start with, help me fill in the background for listeners to make sure we all start off in the same place. We're going to dive into building with AI and advice you have, but for the beginning, just what is Replicate? Why might an engineer want to use it? What's the scale of the company today? Just give us a little background.

Ben Firshman: So Replicate lets you run AI models in the cloud with an API. The company came from my co-founder, who was a machine learning engineer at Spotify. This was back in the day - this was eons ago in AI terms. This was maybe about five years ago when he left Spotify. Back then it was called machine learning. It was very much an academic pursuit. All of the new advances were published as academic papers on the archive. A lot of his job at Spotify was to try and productionize these machine learning models, but they just didn't work. More often than not, if you were lucky, you had scraps of code on GitHub, but you couldn't really get those running. Maybe you didn't have the weights or all this kind of thing.

So I connected this back to my work at Docker. I worked on products at Docker and created Docker Compose, which is one of the core tools. We'd kind of solved this problem for normal software by just telling software engineers to put their work inside this metaphorical shipping container, and then you knew that you could ship it to other developers, and they could run it. You could send it to different clouds, and it would run in production.

So we wanted to bring that same analogy to machine learning. We created this core open source technology called Cog, which lets you package a machine learning model as a container, and then Replicate is the place where all of these machine learning models are published. We've got this huge library that all of these researchers and hackers have published on Replicate - tens of thousands of production-ready models. Software engineers can take these models and run them with one line of code, without having to understand all the internals about how the model works, and without having to set up GPUs and all this kind of thing. They can just copy and paste the line of code and be using AI. On top of this, people can deploy custom models to Replicate. If you're customizing the models and doing custom things, you can take these open source models as starting points, but you can just deploy arbitrary code to Replicate as well.

[00:03:13] Replicate’s User Base and Scaling

Raza Habib: And I'm right in thinking that now it's sort of on the order of tens of millions of users and thousands, or tens of thousands of companies that are sort of paying customers of Replicate today, so it's operating at significant scale?

Ben Firshman: Yeah, on the order of single millions of users and hundreds of thousands of developers, and tens of thousands of customers.

Raza Habib: That's particularly interesting to me. I think Shawn Wang, you know, aka Swyx, who coined the term AI engineer, talks about this boundary of the API line, right? The type of engineer who's building machine learning models from scratch or working on the architectures, and then on the other side, people who are consuming them and building products with them. And clearly Replicate is living firmly on helping people who are on one side of that API line. It feels like quite a new thing for a lot of software engineers who are actually using Replicate. Can you help me understand a little bit about the user base, how it's grown over time? What's the makeup of hobbyists versus people who are building real production systems? Is it 30 million people tinkering on the weekends, or what are people building with it?

Ben Firshman: Yeah, it's a whole mix of things. I think primarily we like to say, using the term coined by Swyx, they're AI engineers - they're a bit like software developers, sophisticated software developers who are figuring out AI. It's not like "how does the math work?" It's more about how do I prompt a model, what models do I use to solve my problem, and how do I combine them together to solve some problem in the product I'm building? The primary thing they're doing is building a product or solving a problem, and that really is the core of our users.

I think we've got lots of deep machine learning experts as well, the sort of people who are publishing models, but most people on Replicate are using the models to build products. In terms of numbers, I think most people are kind of tinkering and experimenting with it. They might just be running the model to solve a problem, like cleaning up an image as part of their work. But we've also got plenty of people who are working on side projects integrating these models, working on experimental projects at work and stuff like that.

But we've also got plenty of people who are building really serious companies on Replicate as well. We've got whole startups, companies like Tavis or Labelbox, who are effectively building their whole companies on top of it because they're AI startups. But we've also got plenty of larger companies using us as well. We've got a number of large enterprises who are using us because they're all experimenting with AI in some form, and they're not necessarily experts at deploying and running these models.

Raza Habib: For those larger enterprises, do you mind sharing any use cases that you have in mind? Are these enterprises typically technology companies, or are we now also in the realms of companies that you wouldn't otherwise think of? If Replicate didn't exist, they probably wouldn't be able to use AI, but because we've kind of lowered the barrier to entry, have we opened up new use cases?

Ben Firshman: It's a mix of things. To give you an idea of the kind of use cases, we've got people who are running custom language models. Plenty of enterprises are using OpenAI and Claude APIs, for example. But in some cases, they are training their own models. They might be fine-tuning a model or training a model from scratch, and they need that hosted somewhere. They could do it themselves, but it's a lot of work to set up, and they'd rather just hand that off to someone who's the expert.

We have, interestingly, a lot of marketing departments using us, because one of the core things that people use Replicate for is image generation. We've got lots of companies who are generating assets for marketing using image generation models on Replicate, which I think is really interesting. We've also got several advertising companies who are using us for a similar kind of reason. We've got big game studios who are generating game assets on Replicate - a few household names, which unfortunately I can't mention, but several of them are generating stuff on Replicate.

It's a whole swath of things, like this combination of technology companies, but also some really traditional companies as well, who are using us for either marketing assets or apps that they're building on the side that are almost like marketing assets for their product. That's just a sample of what these big, sophisticated companies are doing. And it's not like they're retooling the whole company on AI at this stage - it's just a few individual tools, a few individual products on the side, or features inside their core products. They're not retooling the whole company on Gen AI quite yet.

[00:07:45] The Complexity of Traditional AI Deployment

Raza Habib: Can you help me understand what it would have been like to deploy a machine learning model, either pre-Replicate or even just a couple of years ago, when tooling in general was just a lot less sophisticated? What was needed before, and what does it look like now? Just to help a listener understand what's changed, and therefore, what opportunities are available to them - what things can they do now that maybe they wouldn't have considered before because the barrier was higher?

Ben Firshman: So suppose you have a bunch of GPU machines, and you want some machine learning model deployed on them. Let's run through step by step all the things you might have to do. First of all, you have some model weights that you want to get deployed - you first have to wrap that up in some code that actually loads and runs that model, and it might do some pre and post-processing. You then need to wrap that up in some kind of API or queue server. Queuing servers usually work better for AI workloads for various reasons - standard web deployment systems aren't perfectly well suited for that.

So you're building some kind of queuing system around this model, or queuing client for this model. Then you need to package it up inside a Docker container with all of the right dependencies and the right CUDA libraries and video drivers to make this all work correctly. You then need to deploy a Kubernetes cluster on your servers. You need to deploy this container that you've built on that Kubernetes cluster. You need to version that thing and roll that thing out, and do rolling deploys.

You then need to create some way to feed this thing with jobs - so you need some kind of queuing system. And there's all sorts of complexities in building out task running systems. You need to auto-scale this thing if you've got varying load. Within that API server, there are all sorts of complexities in making that run correctly. You need to batch jobs on the GPU. You need to use the GPU efficiently. You might need to optimize the model to get it to run efficiently. And then you need to test this. Testing and monitoring machine learning is very different from normal software. You can't necessarily just write a unit test. You have to monitor the live system and see how it's performing. You kind of get the idea - there's all the complexity of deploying a normal application with lots of nuance and variation that makes it much more complex, particularly when you're dealing with GPUs. And for a start, you have to procure the GPUs, which is challenging itself.

[00:10:24] Simplifying AI with Replicate's API

Raza Habib: And post-Replicate - Replicate being a great example - what does that look like now? What are the models capable of that people have access to? So the experience now, if I understand correctly, is I would come to the Replicate website, get my Authentication Key, and there's a whole family of models that are already hosted there that have standardized APIs. I can send in text, images, whatever it might be, and get back predictions, image generations, text generation. That wasn't possible a few years ago, right? And it wasn't just not possible because Replicate didn't exist, but because for most machine learning use cases, people were having to build very custom models. The types of applications were very use case specific. So can you talk me through what models you're hosting? What are the types of models that you have, and what's changed with Gen AI that means it makes sense for there to exist an ML-as-a-service API company when that wasn't true a few years ago?

Ben Firshman: We host a huge variety of generative AI models - every kind you can imagine: things that generate text, image, audio, video assets, things that modify or turn one image into another image with some kind of variation, or things that translate between all of those modalities. It's both open source things and proprietary models in our library because we want to make all of these accessible to people in one place.

Notably, compared to when we started this company, the models we had back then were very specific use cases because they were much smaller models. For image use cases, it was relatively simple stuff like image segmentation, or image embedding models. The next step above was things that kind of looked like the generative models we have right now, but were just much simpler models. We had GAN models that could just generate images from some vector, and we had models that modified people's faces. It's similar to the general-purpose image modification models we have right now, but they were really fine-tuned on specific tasks, like "take a frowning face and make it smile." It was similar techniques, but just much smaller models.

I think what's really changed now is we just have much bigger general-purpose models, and instead of training a particular model to do a specific thing, you can prompt a model to solve that task. That's really what's changed over the past few years. There's some nuance to that though, we find people who are doing very sophisticated stuff still need to customize the model in some way. We see lots of people fine-tune image models and pick apart the image models to do interesting things with them. They might plug in a ControlNet or something to produce output in a particular format. That really requires being at the code layer to edit the models.

[00:13:50] ControlNets and the Challenges of Image Models

Raza Habib: For someone who doesn't know, could you quickly fill in the gaps for us? What does it mean to use a ControlNet? What does it make possible? And as part of that explanation of a ControlNet - you mentioned having original GAN models - not everyone will have played with image models. People will probably have seen DALL-E and other stuff, but I think people are much more familiar with using language models than they are with playing with the image models themselves. So can you give us a sense of what kind of quality of output is possible and how controllable it is? Just give us a little flavor of the current state of the art. Even I think I may not know this anymore because it moves so quickly - it's been a couple of months since I last looked at this in detail. I suspect I'm out of date already.

Ben Firshman: So Flux is the state of the art right now for image models. It's incredibly good, and particularly good at prompting and composition of images. You can describe in great detail the kind of image that you want, and it will be able to lay out that image for you in very nearly photo-realism now, or in whatever particular style that you want, which is quite extreme work, particularly from an open source model with a commercial license.

Ben Firshman: It basically lets you write a text prompt describing the image you want, and it will give you an image of that thing. It could be anything of arbitrary detail - you know, a cat with a hat, smoking a cigar, sitting on a mat, and the mat is a magic carpet flying throughout space - and it will give you a really crisp image of precisely that thing.

Raza Habib: How good is it at following those instructions now? Because the earlier versions of these models were okay, but they wouldn't be precise in giving you what you asked For.

Ben Firshman: Very precise. Obviously, there's a limit at some points if you get too detailed, but you can give it that level of detail. Including, like, the cat is actually sitting on the shoulders of a dog on that mat, flying through outer space. And it will be able to give you, it'll be able to compose that image with enough accuracy. And you can say, like this object on the left, there's objects on the right, and this person is wearing a purple t-shirt, and it will be able to compose all of these things in great detail.

Raza Habib: It is quite extraordinary. I think it's worth pausing on for a moment, actually, and for two reasons. One, I'm old enough, sadly, to remember the first GAN papers when they came out. I'd recommend anyone listening to this to go and look at the image generations in the first GAN paper. This is around 2014-2015 - Ben's looking it up right now - and they're terrible, like they're grainy pictures of MNIST digits and CIFAR images. And that was really exciting at the time. That was a little under 10 years ago.

Ben Firshman: Like, a fuzzy 20 by 20 picture of a giraffe.

Raza Habib: Yeah, and you can just about tell it's a giraffe, right? But it's not obvious. People were really excited by that. And then when DALL-E came out, there was a whole bunch of deep learning naysayers who said, like, yes, it can do these things, but it'll never be able to do compositionality. The argument from people like Gary Marcus was that neural networks won't be able to, you know - they can put the red block next to the blue block, but they won't say which one is above or below. And now you're telling me, and I believe you because I've seen close enough examples, that not only can they do compositionality, but they can do it in very complicated and nuanced and specific ways already. And DALL-E was two years ago, three years ago, and the original GAN paper was less than a decade ago. So we've gone from a fuzzy pixel, maybe of a giraffe, to something sitting on the shoulders of something on a flying carpet in space.

Ben Firshman: It's absolutely extraordinary, and it's funny being in this space - it feels like a frog getting boiled in water, and you step back and think, holy crap, computers can do these things now.

Raza Habib: And the reason I wanted to stress that point was that if you are listening to this and you're an AI engineer, or someone who's building AI products, and you haven't played with the image side of things yet, then I think you might be surprised by how much better it's gotten recently. I'd encourage people to try and check that out.

Ben Firshman: Absolutely. And I think some of the other things that are extraordinary with these image models, as well as what you were hinting at before, is that there's this basic text-to-image stuff, but it's very hard to describe some things with prompts, particularly in the image world, where it's just a very high dimensional output. There's an example of this which is very common in image models - you might want to have that image model produce things in a certain style that is the same style as your video game or your illustration style. Or you might want to output an image with your face in it, or a particular object that is not in its training data or very hard to describe with a prompt for whatever reason. And to do that, you can fine-tune these image models, and it is unreasonably effective. Fine-tuning these image models is really extraordinarily easy to do, particularly compared to fine-tuning open source language models, which is actually surprisingly finicky and you need a lot of data.

Raza Habib: How much data do I need if I want to fine-tune one of these things? If I want to put my face into an image model and be able to reliably create it, or I have a particular cartoon style I want to customize to - are we talking 5, 10, 1000? What's the data volume that I need to do this?

Ben Firshman: Ten images. Sometimes you can get it with five, which compared to language models where you need on the order of thousands, maybe tens of thousands of examples to get it good at something. It's able to do these things with ten images, which is just really extraordinary. And it's very simple to do - you don't need to do a lot of tweaking of parameters. You just dump ten JPEGs into this thing, and it's producing incredibly crisp images of whatever you want, in whatever scenario you want, which is really extraordinary.

[00:19:42] Fragmentation in AI Models: Images vs. Language

Raza Habib: Something I wanted to chat about was the breakdown of different types of models. I get the impression that Replicate maybe leans a little bit more towards images. I know that you guys also have audio generation models, open source language models as well. And for LLMs, we've got these very general purpose foundation models with a small number of providers that have generally dominated. For some reason, it feels like with image models, there's many more different image model providers and lots of different types of models. Why is it more fragmented on the image side than the language side? Do you think that will persist? Is that something that is fundamental to these models, or is it just a coincidence? What's going on?

Ben Firshman: I think it's a combination of real reasons and cultural reasons. I think the cultural reason is actually that image models, from the start, have been open source, whereas language models have aired towards being closed source. The first open source generative image model that really kicked off - lots of stuff was going on before this, but the first one that really kicked off was Stable Diffusion. It wasn't until a year later until we got Llama 2, which was the first real generative text model that was kind of big and useful.

So image, video, audio, 3D, all these kinds of things have just culturally been open source for whatever reason. I think there are also some real reasons why it's been like this. Language models tend to be much larger because there's just a lot more - for lack of a better way of describing it - more intelligence involved. So they just need to be much larger models. And because of that, they tend to be closed source. They tend to be hosted by these big providers because they're so much more expensive to train.

And I think there's also an interesting technical thing in there as well, where for whatever reason, with non-language models, people tend to have to customize them more to get the results they need. They need to be able to fine-tune them. They need to be able to tinker with the code.

Raza Habib: Is that because prompting is less effective as a medium of controlling an image?

Ben Firshman: Exactly - prompting is less effective, so you have to fine-tune it. You might have to pipeline it with other models. You might have to use these techniques like ControlNets. ControlNets are just things that let you output things in a particular shape, or modify just part of this image, or output this thing but it should look like that - it's just ways of controlling the output of models in various ways.

Ben Firshman: And for that, you kind of need to be at the model layer - you need to be tinkering with the code and stuff like this, which kind of implies that people will be open- sourcing models more, because the customers demand that they need to be able to do that stuff. And naturally, there's a sort of survivorship bias in there that the ones that people tend to use are the ones that they can tinker with.

I think the core of it is that it's much harder to prompt these things, so you really need to be doing things at a sort of lower level. And another interesting thing about this is that strings of text are relatively easy to process, but these other modalities' outputs are actually often much lower level. If you're producing some kind of live interactive experience where you're maybe doing live translation or something like that with audio - that's actually quite a low-level systems plumbing problem. You can't just call an API to do that, so people actually need to be interacting at the code layer with these models.

Raza Habib: Do you think that will change as the models become more multi-modal? Is it possible that this is a kind of transitionary phase where people have to go and figure out which image model is right for their use case, and it'll consolidate? Or do you think there's some fundamental reason why it'll continue to be dominated maybe by open source or by lots of different models?

I guess if you told me four years ago that language models would get good enough that I wouldn't need to do any fine-tuning, and that anyone could kind of just hit an API in a very generic way, I would have been somewhat skeptical. I would have thought they need to get access to the model internals, they're going to want to be doing fine-tuning, they need to be able to do stuff at different layers - you know, back before, people used to freeze a lot of the layers and fine-tune, say, the last layer, or things like this. They had to have access to the model to do this. Or if you want to do LoRA or something - why wouldn't that also happen for other models? Or will it happen for other models?

Ben Firshman: I think there's some equilibrium here that we've yet to find. But for some of the reasons I explained before, I think there are some fundamental reasons why this is a bit different. But I think either way, regardless of whether it's the big proprietary models that do everything, or these small models that you tinker with or do very specific tasks, I think you're still going to need to combine these models together in interesting ways. Whether it's at the API layer or the model layer, there's going to be some need to be able to customize and pipeline and produce the overall composition.

Raza Habib: With prompts and with datasets and inputs, it's never, or it's almost never, just the model itself.

[00:25:05] Customization and Multi-Model Pipelines in Production

Ben Firshman: Yeah, and I think something we find as well is, even if there are really giant, multi-modal models that do everything in production, two things happen. One thing is that you want to reduce costs, so you want to use smaller models that are more specialized at what they do, because it's just much cheaper to run those things, and it's faster for your product as well.

But another thing that tends to happen when you actually put things to production is that you can't just use one big model and magic will happen. There tends to be a lot of duct tape and heuristics and custom plumbing to actually make something work in production. In practice, we find people need to customize the models. They need to put their own duct tape around this thing to make it behave in the way they need. And maybe at some point we're going to get these perfect God models that just produce products, but I think we're a fair way from there.

Raza Habib: Fair enough. Changing tack slightly - you didn't start off in machine learning, right? Your history has been as a software engineer, building tools like Docker Compose, which is, I think, probably what you're most well known for before Replicate and a whole bunch of other things. Obviously, you've had to learn a lot about machine learning and AI in this move to being the founder of Replicate. What skills or knowledge did you find most useful to learn? And what would you recommend to others who maybe are in a similar position to where you were a few years ago, who want to get up to speed? What is it worth them learning versus not learning? Where should they be spending their time? What skills are important?

[00:26:33] Learning by Doing: Skills for AI Engineers

Ben Firshman: My best advice is to just try these things and see how they behave at a high level. Just use them and understand how prompting works, understand the differences between all the different models and how they behave, how you can plug a bunch of models together to produce interesting higher-order behavior. This is as opposed to learning all of theory, which I think is worth knowing, but it's not going to help you build a great product. Unless you're doing some really low-level stuff and really advanced stuff, you really don't need to know that. There's just so much unexplored territory in just using these models off the shelf for interesting things, exploring different ways of using language models and exploring different ways of using image models. There's just so much unexplored parts of the map that are still yet to be found.

I suppose I'm sort of lucky in that I did learn about neural networks in college. So I do have some understanding of how these things work internally. And it's really helpful to be able to understand the full stack of how something works in the same way when you're writing code - it's helpful to understand what the compiler is doing, what the CPU is doing, depending on how low-level you're going. But if you're just making a website or web products, you don't need to know what's in the L1 cache and what isn't. In the same way, you don't need to understand what's going on with these neural networks.

Raza Habib: And so would your advice be best summarized as the best way to learn, in this case, is actually to just learn by doing? Fundamentally, it's not something where you need to go and buy a book and read about it. Actually, the quickest way to get up to speed is just start off building things, and you can get going pretty fast?

Ben Firshman: Yep, I would recommend just grabbing GPT-4 and just start building. Just start experimenting with it and see how it behaves, and see what you can do with it. There are so many examples of people building stuff on GitHub or something to use as a starting point.

[00:28:44] Applying AI in Government

Raza Habib: Two more questions, one quite specific to you, and then one I ask everyone. One that I was just curious to get your thoughts on is, I know from speaking to you in the past that in a past life, you used to be a software engineer in the civil service in the UK, actually working in government. I was curious if you had any thoughts or lessons or learnings from that time about how you think governments might be able to make use of AI more, and where are the opportunities for your old teams to potentially be applying AI?

Ben Firshman: I think something that the UK Government did so well is it was just repeated again and again: start with user needs. Because so often in government, people started with "What do we need to do? What's the purpose of our department inside government?" Instead of thinking "Actually, what does the citizen need to do?"

Ben Firshman: It's a very similar thing inside enterprises. In some sense, governments are very similar to large companies in this way, and the key thing is just to start with: what does the user need? So I think the first thing is just thinking about what citizens need to do, what your customers need to do in a large company. And being like, "Okay, we have all this new technology now. Could we solve that problem better with this new technology?"

So often people start with technology. They think, "Oh, we got this new technology. Let's try and make something." A really classic one in the AI world is like, "Let's build a chatbot." And it's not necessarily clear whether the user actually wants a chatbot, or whether they just want a web page with that information, or whatever the other alternatives might be.

Raza Habib: It's one of these lessons that is easy to say, feels obvious in retrospect - like obviously work backwards from some user need - and remarkably easy to accidentally forget to do in practice.

Ben Firshman: I think that's an interesting lesson in here as well, actually, for how to build really great products, because it is complicated. What's so complicated about AI right now is like I was talking about before - there's just so many unexplored parts of the map. We don't even know what these systems can do yet. So how could we possibly map things that customers need to do to a system we don't even know the capabilities of? And so I think the trick here is doing a bit of both. You can't purely - well, you can actually purely start with user needs, because you do know some things that the system can do, but you can't purely start with technology.

[00:31:12] Iterative Development and Co-Evolution of AI Specs

Raza Habib: That's really true. Something we see in practice is there's often a bit of a co-evolution of the exact spec with the system itself. People will start off doing their thing - they'll come in with some goal in mind, and then they'll either hit the limitations or discover the models are able to do something that they otherwise couldn't. And in the process of generating output from the model and tweaking prompts and adjusting it, they actually gain better insight into what they want.

I'm trying to think of concrete examples that I'm allowed to talk about, but I've certainly had this experience personally. The most recent time I had this experience was when I was trying to build an internal tool for the team that would summarize all of our sales interactions. We record all of our sales calls, and I wanted a weekly digest for the team that would say, "Hey, these are the customer things that came up this week." That's a very high-level concept - it's come from a real user need. I want my team to be aware of what's possible, but actually the exact details of the nuance of what should be in that report and what information is most useful to include had to co-evolve as a combination of what the model was able to accurately summarize and pull out across lots of calls, as well as what I wanted the team to know.

Raza Habib: And as I was generating more and more of these reports, I'd go, "Oh, actually no, this kind of information is useless." Like, I thought that I wanted all the feature requests, but when you see it all listed out, they're too heterogeneous. I actually need a summary that is more meaningful. And so there's a co-evolution of what can the models do, and what do I want, and what does the user want? That I think is very hard to do if you just decide up front what the spec is. And I think more so than most software development, it needs to be highly iterative.

Ben Firshman: Yeah, 100% and I think my best advice to people building AI products, particularly in a large company, being like, "Okay, how do we use AI?" My best advice is not to come with a bunch of specs of like, "Okay, these are the products we need." It's just like, try 100 things and see what sticks. And that is the only way to build with AI right now.

[00:33:13] Final Reflections on AI Hype

Raza Habib: All right. The last question for me is the same one that I try to put to almost everyone, and it's: do you think AI today is over-hyped or under-hyped? Or, if you want to have a little bit more nuance, where are the places where you think it's over-hyped versus under-hyped?

Ben Firshman: In some ways, I think it is simultaneously both over-hyped and under- hyped. And I think both suffer from a lack of really understanding what these systems are and how they work. I think there's one end of the camp which is, "We have these incredibly advanced, generally intelligent systems." And there's another end of the camp, usually comes from software developers, of just like, "Oh well, it's just statistics" is the meme. But like relatively incapable systems.

And I think the reality is somewhere in between. We have these extraordinary systems that are able to do things that computers not only couldn't do before, but we didn't even really conceive that computers could do these things. Generating these incredibly crisp images is just something that's happened so fast, and it's so unexpectedly good at it. And a lot of these things, the large language models do as well, and it's improving at such an incredible rate.

But it's still a long way away from this fuzzy idea of just like this completely general, intelligent system that can do everything. We actually have these very practical systems that can do very concrete tasks that behave a lot like software. This is a thing that's accessible to software engineers, that software engineers can do. There's not just this very abstract concept of what AI is, and I find that quite a hopeful thing, because we have these very extraordinarily capable systems that are very concrete, that can be used for real work, and are improving at an extraordinary rate. And depending on what misinformed person you talk to, I think that reality is both under-hyped and over-hyped.

[00:35:18] Conclusion

Raza Habib: Yeah, that does make sense. It definitely resonates. And maybe the fact that a lot of the use cases that we could think of were quite boring or quite mundane speaks to the fact that it's actually useful now, right? Like it becomes boring when it transitions to being actually applicable. And so maybe, as you say, that is quite hopeful at the same time.

About the author

avatar
Name
Raza Habib
Role
Cofounder and CEO
Twitter
𝕏RazRazcle
Raza is the CEO and Cofounder at Humanloop. He was inspired to work on AI as “the most transformative technology in our lifetimes” after studying under Prof David Mackay while doing Physics at Cambridge. Raza was the founding engineer of Monolith AI – applying AI to mechanical engineering, and has built speech systems at Google AI. He has a PhD in Machine Learning from UCL.