Principles for Building Excellent AI Features

By Raza HabibCofounder and CEO

How do you build AI tools that actually meet users’ needs? In this episode of High Agency, Raza speaks with Lorilyn McCue, the driving force behind Superhuman’s AI-powered features. Lorilyn lays out the principles that guide her team’s work, from continuous learning to prioritizing user feedback. Learn how Superhuman’s "learning-first" approach allows them to fine-tune features like Ask AI and AI-driven summaries, creating practical solutions for today’s professionals.

Subscribe to Humanloop’s new podcast, High Agency, on YouTube, Spotify, or Apple Podcasts

Chapters

00:00 - Introduction
04:20 - Overview of the superhuman
06:50 - Instant reply and ask AI
10:00 - Building on-demand vs. always-on AI features
13:45 - Prompt engineering for effective summarization
22:35 - The importance of seamless AI integration in user workflows
25:10 - Developing advanced email search with contextual reasoning
29:45 - Leveraging user feedback
32:15 - Balancing customization and scalability in AI-Generated emails
36:05 - Approach to prioritization
39:30 - Real-world use cases: the versatility of current AI capabilities
43:15 - Learning and staying updated in the rapidly evolving AI field
46:00 - Is AI overhyped or underhyped?
49:20 - Final thoughts and closing remarks

Podcast:

[00:00:00] Lorilyn McCue: I would really focus on two principles. Principle one is optimize for learning. Principle two is integrate AI seamlessly into the product. Honestly, my dream is that people use AI features and don't realize it's an AI feature. There's so many other things to do right now that that is what's interesting to me. Like there is an adventure to be had immediately, and I want to have it.

[00:00:27] Raza Habib: This is High Agency, the podcast for AI builders. I'm Raza Habib, and I'm delighted today to be joined by Lorilyn McCue. Lorilyn's had a really interesting and not totally conventional path into AI - she started as an Apache helicopter pilot for the US Army, taught PE at West Point, and then transitioned into tech. She was a product manager at Impera and Slack before her current role, which is what we're going to focus on today, which is that she's the Product Manager at Superhuman responsible for AI. So Lorilyn, thanks so much for coming on the show.

[00:00:57] Lorilyn McCue: Thanks for having me. And one correction, it's not the Air Force, it's the Army. And a lot of my friends would get really mad at me if I did not correct that.

[00:01:05] Raza Habib: I'm going to leave that in, because I think it's interesting to have that correction on the record. That's great. Well, I'm super excited to dig into what you've built at Superhuman and the AI features, and how you think about AI product development. To start with, just to get everyone to the same baseline, for those who maybe haven't used Superhuman and who don't know about the email client, can you just give us a little bit of an overview of Superhuman itself, and also a sense of the scale of it, and then maybe we can talk about the AI features and what it was like to build them?

[00:01:34] Lorilyn McCue: So Superhuman is the most productive AI-powered email app ever made. It can save you up to four hours a week. We found that it's incredibly popular with venture capitalists, with CEOs, with salespeople, especially anyone who gets a ton of emails. It's probably the least favorite part of their day in Superhuman.

[00:01:58] Raza Habib: Can you help me understand, sort of literally, what it is like? How does it do all of this? What's different about Superhuman from a normal - so it's an email client of some kind?

[00:02:06] Lorilyn McCue: Email client, yeah. So a couple things that are special about Superhuman. It has a really heavy philosophy around keyboard shortcuts. So it has this amazing onboarding tutorial to teach you how to use keyboard shortcuts to get through your inbox way faster. It also has AI features embedded into your email directly to make it faster for you to get through your inbox and find answers.

[00:02:29] Raza Habib: So we've established Superhuman is like one of the leading platforms for busy people to be doing email. I think you were also one of the first people to start incorporating AI features soon after ChatGPT came out early last year, and you've got a bunch of AI features now. I think there's Write with AI, Instant Reply, and then more complicated things like Ask AI. Do you mind just telling us a little bit about those features? What the most interesting ones do? And then hopefully we can dive into the process of what it was like to build them.

[00:02:56] Lorilyn McCue: So I personally am obsessed with efficiency. I would love to cut all the cruft out of your work life and make it so that you are just working on the things that actually require your brain. And so we've basically tackled the inbox one annoying task at a time. So people write up to 50 emails a week. So one of the first things we said is, okay, it can take up to two minutes to write an email. Let's see if we can cut that down for people. So we built Write with AI, which allows you to convert a short phrase into a full email, which is pretty exciting.

We also have Instant Reply for those emails that maybe only take a sentence or two to get an answer out, like, "hey, this sounds great. I'd love to meet next week" or "No, thanks. I'm not interested in that time." So to bring that two minutes per email down to something like just seconds, is something that is really exciting to me.

So that's the first part, Write with AI and the Instant Reply. The other part is reading emails. So you're gonna get a ton of emails, and so what we have is we have auto summarize. What's something really cool about Superhuman is that this isn't a feature that you have to click something to get the summary. The summary is on every single email. So we chose to get that summary the moment you get the email. So when you open the email, you have the summary already there. It's a one line summary, just to give you the quick gist of the email. And if you click on that, there's little bullet points that say, okay, here are the details that you actually really need to know.

The third one is Ask AI. So not only are you saving time writing and saving time with reading, but people spend like, half an hour a week searching, which is just so painful. I mean, maybe you know the keyword searches. Maybe you know the right phrases, the from and the to and the before and the after. We decided this is going to be way easier if we can just index your emails and you can ask, "what are the top responses to our recent launch of Instant Event," or "when am I meeting with Lauren?" or "when did so and so last email."

So the ability to really ask instead of searching, was something that was exciting for us to tackle. And then just today, we launched a feature called Instant Event, which I'm super excited about. This is a work feature, but I am a parent, and I get a ton of parent emails about events for my children, and we have the ability to turn an email into an event in one click. So literally, you hit B and it'll be like, "Okay, your kid's soccer practice tomorrow at 3pm don't forget this."

[00:05:32] Raza Habib: And I can see there's kind of a range of complexity there, from the kind of generate me a message from a few bullet points right the way through to Ask AI, which I know is probably a much harder feature to build, or there's a lot more going on under the hood. I remember there was a really interesting distinction that I heard your CEO Rahul bring up, where he spoke about the distinction between on-demand versus always-on AI with the on-demand being things like, generate me the message where you put some bullet points in, and it gives you the message versus what you mentioned with the summaries. And I imagine there's like a lot of difficult, actual execution challenges to building the Always On version. Would you be able to tell us a little bit about that? Like, I guess the on-demand one, you only have to trigger it when a person puts a message in. You don't have to maybe worry about cost quite as much. What are the challenges that come up with building an always-on feature like that?

[00:06:25] Lorilyn McCue: Yeah. So with an always-on feature, it is very expensive, and you care a lot about latency and you care a lot about quality. So we've noticed that most of our other competitors, they have summaries if you ask for it, which makes sense. You know, you don't want to process every single email. What if somebody doesn't read it? Well, we decided, you know, we don't care about the cost. We want to make sure that everyone has this feature. They don't have to think about it. In fact, I've had interviews with people, new users of Superhuman and they've never realized that that's an AI feature. It just like feels a part of the product. Like, "Oh yeah, there's just a summary right there." And to us, that means that the moment your email comes in, we send it to OpenAI, we get the summary. We get the instant replies. It's there, it's available before you even open the email.

[00:07:13] Raza Habib: If you're having to do that on every email that comes in, are you doing anything smart to reduce the cost or using smaller models? Are you kind of deciding whether or not to do summaries in certain emails? What's kind of going on under the hood there?

[00:07:26] Lorilyn McCue: Yeah, there are definitely emails we don't summarize. So we don't like summarize marketing emails you probably don't care about, you know, getting a summary on those. We don't summarize, obviously, spam, if it doesn't go in like your main inbox, we don't summarize that. But it's pretty much everything other than that. Yeah, and we definitely picked the best model for the job, and that is both in quality and in terms of speed. So it's gotta be fast, it's gotta be accurate, it's gotta be able to handle more complex cases.

[00:07:48] Lorilyn McCue: Here's a good example. So one of our instant reply challenges is somebody says goodbye, like they're like, "hello, you know, it's been wonderful. I'm leaving the company." And everybody's like, "Oh my gosh, I'm so sad. I miss you. Best of luck. Best of luck." And then our instant replies sometimes would be like, "dear Jonathan, thank you for saying goodbye to so-" And we're like, "Oh come on. You got to be smarter than this. Like, please figure this out." And so we actually did a lot of shenanigans to make sure it understood which message to respond to. Like, what was the core of it? What is the focus of the conversation? And so figuring out a model that was really good at this but also wouldn't take like seconds to get the response back, was a pretty big challenge. Now it's easier because, you know, there's some really great models that are really fast and we get excited whenever a new model is released, it's like, "okay, let's update this. Let's figure out what's possible now."

[00:08:49] Raza Habib: What I would love to hear is, if you could just talk me through like your process for building a feature like that and what the journey is. But things that popped into my head as you went along the way was, who's doing, say, the prompt engineering for creating that summary? Because domain expertise is gonna be so important in the style of summary. Something that I often bring up with people is that there's no correct summary for a given document or an email. It depends on who the audience is and what they care about, and tone of voice matters. So I'd be really curious how you guys think about those challenges of who summarization is for. And then, you know, you said when a new model comes out, like you want to try it out immediately. I'd also love to know the things you do to know if things are actually working better. Like, how are you solving those problems? But so those are things that popped into my head, the meta question is like, you know, how do you go and build this feature? And if you could touch on those points along the way, I'd be grateful.

[00:09:39] Lorilyn McCue: So let me first talk about who's prompt engineering. So one of the things I've heard you talk about in your podcast is that it kind of democratizes who's writing the prompt. Like, yeah, I'm writing the prompt. Like, the lone me, who is not coding is doing the prompt engineering for those features.

[00:09:55] Raza Habib: Well, you are coding now, right? I mean, that's a key part of the application.

[00:09:59] Lorilyn McCue: Yeah, which is incredibly powerful. So for me, I have spent like many nights up late, taking my dataset and tweaking the prompt, running it, okay. This is still a problem, tweaking the prompt, running it, okay. This is still a problem, just like again and again and again, until we have a prompt that passes the standards that we have for the particular dataset that we have.

[00:10:25] Raza Habib: And roughly how big is that dataset that you're mapping it over? And what are you, how are you kind of scoring it? Are you just doing it by eye? Like, what's that cycle look like?

[00:10:35] Lorilyn McCue: So we've gotten a lot better with this. So we started with me just going through and saying, this is acceptable, this is not acceptable. Obviously, at some point that doesn't scale. So then we started getting to LLMs as a judge and creating scores that would say, okay, you know, is this replying to the right person? Is this addressing the main focus of the question? Is this hitting the main bullet points?

Eventually we actually have like a more solid dataset that actually says like, this should respond to so-and-so, the focus of this conversation is this, and I've gotten a lot of help from QA with this. So our QA engineer has really risen to the occasion to help us say, okay, here are the weird edge cases, here's the right answer. We still use LLMs as a judge for a lot of this. Obviously we have to, but getting a little bit more specific in that dataset, and what's like the correct output has been really helpful for us.

[00:11:32] Raza Habib: Okay. And I slightly cut you off, but so you are the primary prompt engineer, and you spend quite a lot of time running over test cases, looking at them by eye, finding errors, tweaking the prompt. Can you give any examples of the kinds of changes you're making to a prompt? I've done a lot of this work myself. I suspect it's not intuitive necessarily to other people.

[00:11:50] Lorilyn McCue: So we definitely repeat things often, so we'll say something and then say it again. This is still so silly to me, and I don't to this day know if it works. But when we were chatting with OpenAI, one of the examples they had actually like had all caps for certain things to really do this, and we're like, does shouting at the LLM really work? We'll try. So we have tried that as well. Shouting at the LLM in our prompt feels weird.

[00:12:20] Raza Habib: It's interesting. It's not crazy that that would work. It's a shame that we still have to do that. But I suspect the reason why something like that is working is probably because, sadly, in the model's training set somewhere, it's more often the case that if someone's shouted at they follow the instructions afterwards, than if they don't. And as a result, the model's like mimicking what's in the training set. And if that happens to be the case, it'll show up in the instruction tuning stage of the models, like the post training that they do, should hopefully iron those things out. But obviously, some of them make it through.

[00:12:52] Lorilyn McCue: It's just absolutely fascinating. Definitely, what helped us the most was including few-shot examples. So having at least one example of a good output that, for sure, helped us a lot.

[00:13:03] Raza Habib: And something that happens for me when I do that process, and I'm curious to know if I'm weird in this, or you experience this as well, is that is the process of literally doing that workflow, so of looking at test cases and then tweaking the prompt, is during it, I feel like I'm getting a better idea of what I think good looks like. So often at the start, actually, I'm not 100% sure. And so, like, I was doing this the other day for cold outreach messages, like I was trying to sort of create a prompt for drafting messages, and I would realize, oh, actually, the LLM never includes a call to action. Like, I always want to have a call to action, so I would add in a piece or something like that. And there was a sort of co-evolution when I do these things of the specification almost alongside the development.

[00:13:48] Lorilyn McCue: Yeah, I think this is probably one of the key things of prompt engineering and developing AI products, is to learn as fast as possible. Always optimize for learning. So get something in there, and then you're able to look at the output and say, "Oh, I can see something wrong here." Like, "oh, it's really important that we include links in the summary." Oh, this edge case. It turns out we're trying to reply to the last person that emailed about saying goodbye to so-and-so. Okay, let's get clear on this. And it's weird, because for some reason, we just can't think of all these edge cases and all these details before we see the output. And there's something really magical about like iterating your way into a good prompt.

[00:14:34] Raza Habib: And do you do that in just development, or do you find yourselves doing that in production as well? Do you have a system for finding edge cases from production and using those to drive improvements?

[00:14:45] Lorilyn McCue: Yeah, so we have like a thumbs down, thumbs up. I'm sure most product features have, and so yeah, we'll look at the thumbs down responses, as long as someone has agreed to share that with us, and we'll figure out, "Oh, yeah, sure enough. Like, this is an issue," and so we do look at that feedback afterwards, just to make sure we're keeping an eye on quality and like catching any of the other edge cases that might be out there, because there's a wide variety of emails. Like, there is no lack of variety in the kinds of emails that people want to send and get, for sure.

[00:15:17] Raza Habib: So I know we've jumped all over the place, but I do want to try and kind of walk through the journey at least once of what your experience is in building one of these features. So maybe we can jump back to the beginning, and we know the prompt engineering stage comes into it. But what is, you know, how do you go from idea to deployed feature? Typically, like, what does that journey look like for you guys?

[00:15:35] Lorilyn McCue: Yeah, so we look at our user feedback, we figure out like what's the root cause of the problems. You know, basic product manager stuff. We figure out the problem. Then what we do is we take that angle and we match it with, okay, what's possible. And that's when we start prototyping. We start figuring out like what can be done. And then we add one more angle, which is scale, okay, like, what's the scale on this? Like, is, are we doing this when somebody requests it? Are we doing this for every single email?

Okay, with those three things, the problem, what's possible and like what kind of scale are we looking at? Then we move forward with the spec and figuring out, okay, like, what is this going to look like once we go into the execution phase? You know that that's where the prompt engineering comes in. It's talking with the backend engineers and engineers, making sure that everything's piped in appropriately, making sure this feature is really embedded into the product that's super important to us, as I mentioned earlier.

[00:16:35] Lorilyn McCue: Then we release it internally. Usually we release it internally, just to our AI team, because of how wonky things are in the beginning. And then we just test the heck out of it, find all the edge cases, build up our dataset of examples. Once we iron some of that out and it's at a good enough state, we release to the company, and then again, increase the dataset, figure out the edge cases, then we release to our beta users, then we release to a larger set of beta users. Keep on getting feedback, and then eventually we GA it.

[00:17:00] Raza Habib: That's super interesting. Something that jumped out at me about that process is the extent to which you're essentially like launching it first and learning by kind of doing these staged releases to wider and wider groups. How important is that to the process?

[00:17:15] Lorilyn McCue: So important. This is the learning. Fast learn. Fast learn, early. Optimize for learning. We will often launch a product internally months and months before it goes into the hands of users. I mean, I think that's pretty common for products, but it's especially important for AI, just because we don't know all the edge cases ahead of time, we can do our best. But really it's through building up that dataset, figuring out what we want to make sure the evals are able to like what our eval set looks like, what's our standard, okay, for the next level. We want to make sure we're at 80% on emails for the next milestone. We are at 90% on emails for the next one. We're at 95%. We're probably never better than 95% let's be honest. But you know, we're doing our best each step of the way just to figure out how to iterate our way there.

[00:17:59] Raza Habib: And that's kind of like getting the first feature into the hands of customers, presumably, doesn't stop there. Like, is there a process in place to keep improving these things over time? And what does that look like?

[00:18:10] Lorilyn McCue: What we've kind of been doing is releasing and then going back to improve past features, and then releasing a new feature and then going back to improve previous features. So right now, we just released a feature, and we're going to go back to some of our previous features to improve on them, improving the performance, improving the quality, improving the user experience. Often there are external forces that force this - a new model releases. Every time a new model is released, then we are changing our prompt, optimizing again, fine tuning our dataset for the new model, sometimes we'll be able to improve the experience in that moment.

[00:18:50] Raza Habib: And is it consistently the case? So that as new models come out, your features kind of just naturally get better, or because I know that behind the scenes OpenAI and Anthropic and whoever else is trying to improve their models, but they're improving them on lots of dimensions. Lots of dimensions that maybe don't necessarily correlate with the things your users might care about. So in your experience, like, is it just every time a new model comes out, it just gets better for you? Or do you have to update prompts a lot? Like, does the style change, you know? Is it really just the case that your features magically get better with someone else running a model, or is there a lot of work involved to update?

[00:19:21] Lorilyn McCue: That's a great question. Yeah, actually, every time they release a new model, it gets better for us, thank goodness. I don't know if that's gonna end eventually, but for our use cases, as long as we fine tune, it keeps getting better.

[00:19:33] Raza Habib: That gets me to a question that was on my mind, I think, more so for email generation than for almost any use case, people will care about tone of voice, especially if you're synthesizing emails for them, and they'll care about the tone of voice, you know, being close to theirs. How do you guys, or do you guys, do anything specific to achieve that?

[00:19:52] Lorilyn McCue: So voice and tone is incredibly important to us. We really want to make sure that your emails that are AI generated sound like you and really reflect your own voice and tone. So there's a couple of ways we do that. The first is that the moment that you activate AI, we're going to be analyzing the previous emails that you sent to develop a voice and tone style guide. Once we have that, then we're actually going to get one step more specific. If you have emailed somebody already, what we're gonna do is, when you email them again, or you put them in the to line, we're gonna right away send in a few examples of emails you've sent to that exact person.

[00:20:34] Raza Habib: So you're getting specific few-shot examples in the prompt from that person each time. So it's not that you're fine-tuning the model in terms of updating the weights on each person necessarily. You're actually giving custom few shots so that the model kind of can match their style exactly. Okay, that makes sense and answers a question I was going to have, because if you did have to have fine-tunes on a per customer basis, and you might have millions of customers, that's gonna sort of be a, you know, you can do it with things like LoRA or kind of these lower rank adapters, but it would be hard to do with OpenAI. So I was curious how you were achieving that. Okay, it's amazing that that works really well, and fantastic to know.

[00:21:14] Lorilyn McCue: That does mean that, let's say you were really like kind and polite to someone the last three times you emailed them, and this time you really want to be angry, maybe it's not going to get as well that change in tone and voice. So maybe there's some downfalls, but if you're consistent, then it's going to perform very well.

[00:21:30] Raza Habib: Okay, that's cool. I wanted to spend a little bit of time talking about Ask AI, because I think that's a feature that is qualitatively different from the others that you guys have. So I've seen demos of this. It's really powerful. It's not just search over the inbox, but it's search where I can have reasoning built in. It's quite fast. So, you know, when I saw Rahul do a demo, it was like, you know, get me the emails that have been feedback from customers that are longer than a certain length that, you know, have come in the past couple of weeks, or something. And he got that back, which I thought was really cool. Do you mind sharing a little bit of the journey of what it was like to build that feature, how it differs from the others? And, you know, maybe what was hard about it, what you learned along the way?

[00:22:11] Lorilyn McCue: Yeah, that was definitely a feature where I was not the prompt engineer. I was not the person figuring out where are we storing our vectorized emails, what kind of embeddings are we going to use. This was a good AI engineer project-led adventure, and so what we did is we had an engineer prototype using Vercel. The experience, and we used multiple different models and multiple different ways of storing emails to figure out what combination really works the best.

Once we figured that out, then we started working on the UI inside of the product, and then worked on figuring out, okay, let's get all of our internal emails embedded. Let's get the prototype going. Actually, for a while we were using just my emails. So we just had only my emails, and we were just testing all of our queries on like Lauren's emails. Getting past that was great, because there's only so many kinds of emails that I get. So it was really nice to expand to the entire company and figure out all the fun edge cases that were there.

So once we had a prototype out, we actually beta really early, so we released this in early access and let people come off the waitlist early in the summer. And this was because of that philosophy we have to optimize for learning. And what we did is we looked at the feedback that we were getting from users and said, "Okay, people really want to be able to do more complex searches with this." We thought they might want to do more agentic things, you know, like creating events and writing emails. And they do want to do that, for sure, that is something that they want to do. But what we saw is like, "hey, I want to be able to look towards my inbox at a much bigger level, a much broader level, and figure out like, the trends that I'm seeing." This helped us prioritize how we were improving the feature and what kind of tools we were prioritizing making next.

[00:24:11] Raza Habib: And when you say tools, these are APIs that the model can use in building itself, or tools as in like things that the customers have access to?

[00:24:19] Lorilyn McCue: Tools as in like we have one prompt to categorize the query, and it'll send you to the normal Ask AI search tool. Now we have a tool that is creating an event, so that will send you right to the Create Event tool, and then that tool, it has a more specialized query for handling prompt for handling that query.

[00:24:40] Raza Habib: Right? Okay, so that's kind of a router - the first model is kind of like figuring out which of the tools that are available to me should I be using, maybe with something like function calling, and then there's a family of a menu of options that it has access to, and you're adding to that over time, exactly, okay? That makes a lot of sense. And are you having to like, how- you add the extra metadata to these things- this may be going too much into the weeds and tell me if it is. But, you know, if I ask a question that doesn't just talk about the content of the email, but also talks about the time frame, or, you know, the example that Rahul had where he was looking for messages beyond a certain length, or something like that, how are you guys adding that metadata into the emails?

[00:25:21] Lorilyn McCue: So what we are doing is, again, using feedback that we're getting from users and deciding what metadata they want to see. You know, we started with a certain set of metadata, obviously, like from time, we eventually added some more rich metadata, such as the name of the attachment that you're referring to. There's some- oh, man, there's so much metadata that we want to add that we haven't gotten a chance to quite yet, because a lot of it is changing pretty rapidly. For example, read receipts. We don't currently have metadata for like, when somebody has read your email, but people definitely want to know, like, "hey has so-and-so opened my email yet?" Or, "show me what emails people haven't opened that I sent in the last week."

[00:26:02] Raza Habib: It's interesting. The answer actually you gave was a lot richer than I expected in terms of how much metadata there is added. And I think there's a generalizable lesson there for people who are trying to build RAG based systems or trying to do kind of question answering in these complicated ways, which is all the 101 demos are always like, you just take the content, you embed it, you stick it in a vector database, and then you just retrieve. And the reality, in practice, clearly, from what you're saying, is that there's a lot more nuance to the types of things that people are retrieving. So yeah, the semantic content of the message really matters. But all of this extra content about the message, not the message itself, but whether it was read, and the time and the split and are also affecting the search, and somehow have to be put in there too.

[00:26:42] Lorilyn McCue: Yeah, exactly.

[00:26:45] Raza Habib: So Lauren, you know, building with LLMs is pretty new for most companies. It has to be, right? The technology has not been around for that long. If you were to try and sort of be spinning this up at a new company, or you're giving advice to a product manager who was doing this for the first time, maybe yourself a year ago, like, what would be the things that you wish someone could have gone back and taught you, go back and tell yourself, or advice you'd give to someone new?

[00:27:06] Lorilyn McCue: I would really focus on two principles. Principle one is optimize for learning. Principle two is integrate AI seamlessly into the product. What's interesting is those are a little bit contradictory principles. I think so let me kind of dig into each of them.

So with optimize for learning, ideally, you create a lightweight way to test the functionality with internal and eventually external users inside of the product. In Superhuman we have a couple areas of doing that. One is in our Command K. So if you hit Command K, it kind of opens a little bar, and you can type commands into that. So for example, our feature, Instant Event that we just launched today, we first released that inside of Command K, so you could just hit Command K on any email and type it was "create event with AI." So, and then from there, we were able to test it internally a bunch of times and see what we were finding, increase our eval set, and then eventually release it to users and see what kind of things that they were finding.

So there was this really lightweight way to learn inside the product, but what's really important is that you get to that second principle, which is it's really easy to have just a chat bot in the side of your app, and then be like, "Oh, the functionality is there. People can ask all the questions that they want." You can ask, go into Ask AI and say, "Create Event from this email," and it will work. But what's way more powerful is what we've done, which is, okay. Or you can hit B. You hit B, which was the traditional command for creating an event - now that creates an event with AI. That's so much more discoverable. Or we have as one of your instant replies on the bottom, if we detect a date, we say, "Create Event for that date" that is incredibly discoverable, that's seamless. You don't have to think about it. You don't have to say, "Okay, I go over into the chat bot, I open up the chat bot, I type, Create Event from this email."

[00:29:06] Raza Habib: And there's not, there's not some little star button that says, here's where the AI magic is. It's hopefully, in this case, the user is not thinking about the fact that they're using an AI feature. They're just getting on with what they want to do.

[00:29:17] Lorilyn McCue: Honestly, my dream is that people use AI features and don't realize it's an AI feature. That is, that's like, chef's kiss. That is exactly what I want to happen. That is the dream. Whenever I'm in a user interview and they say, "Oh, I didn't realize that was an AI feature." I'm like, perfect. That's exactly what I want you to think.

[00:29:38] Raza Habib: And why is that?

[00:29:41] Lorilyn McCue: Because the goal here is efficiency. The goal here is getting your brain from doing the stupid stuff back to the stuff that actually matters. I don't want your brain to have to think, "Okay, how do I find this? What am I doing? What's powering this?" I just want you to think, "Oh, another email about scheduling an event. Oh, look. Add button says, Create Event on this date. I click it, it's done. I hit save. I move on with my life. I click E, go to the next email."

[00:30:07] Raza Habib: I want to jump back to both of these points. So optimizing for learning and also kind of the seamlessness of integration, one reason why I think so many companies have gone for chat type interfaces, despite them obviously not satisfying this criteria that you mentioned is that they bake in some fault tolerance. So the nice thing about chat is, if it's not perfect, the first time, there is some opportunity for the user to correct the AI. Have you found other UX paradigms for bringing in feedback or correction from users?

[00:30:38] Lorilyn McCue: Well, we've done that a little bit inside of Superhuman. If you thumbs down a summary, we'll actually generate you a new summary.

[00:30:45] Raza Habib: Okay? So you just automatically do that if people aren't happy with it. So on this sort of point of like optimizing for learnings, I completely resonate with that. I think more so than a normal product development. It should be a sort of fire, ready, aim type of setup. But how does that manifest in practice for you guys? Like, how are you guys optimizing for learnings?

[00:31:09] Lorilyn McCue: Yeah, I think it's releasing to ourselves before we're comfortable doing so, as I mentioned earlier, like we are pretty brutal on our own AI team. We're like, "Okay, this is coming. Here we go, get testing." I think also having a really tight beta group. So anytime I have a close interaction with a user or somebody writes in with some incredible feedback about AI, I'll ask them, "hey, this was really helpful talking to you," or, "like, hearing what you had to say. Will you join our AI beta group?" And I have like a personal relationship with that person, and then I'm able to give them a feature that's got still got some cracks in it, and then they're able to give me feedback that is so helpful in improving the feature. So I think having just like a tested group of people who are willing to just deal with a little bit of jank is pretty helpful.

[00:32:07] Raza Habib: Okay, sort of to summarize back the advice you would primarily give yourself if we were able to kind of jump back a year ago or someone new's joining is one optimize for learning. So releasing very early, building in feedback mechanisms that stage release process you described, of internal beta, wider beta, first customers, all customers, having feedback baked into the end user application, looking at one thing that struck me, which you didn't say explicitly, but was implicit, is you're looking at the data a lot yourself, because you mentioned looking at these edge cases and running the models. And the other thing that was, I think, novel, was making sure that it's seamlessly integrated into the flow, so people aren't having to think about the fact that they're using AI, they use the product that they would normally, and stuff just kind of magically happens for them behind the scenes.

[00:32:54] Unknown Speaker: Yeah, right time, right place, right thing. You know what? Let me-

[00:32:57] Lorilyn McCue: Let me clarify one thing, when we have users thumb- not users, when we internally thumbs down or thumbs up examples, we add that to our dataset. We don't do that with users' data, but we do that with our own internal dataset, so that that becomes our eval set automatically, so that we have a more robust set of test cases.

[00:33:18] Raza Habib: Okay, so you kind of, the way you build your test case set is from that initial feedback that you get from internal testing.

[00:33:24] Lorilyn McCue: And think about it like, you know, the sales person at your company, do they have time to send you a screenshot and say, like, "I didn't really like this. This performed poorly." No, they don't have time for that, but they do have time to be like, "don't like it. Thumbs down." And then from there, we can say, "great. We can take a look at it. We can dig into it without even really having to talk to them, if you know it's not necessary."

[00:33:45] Raza Habib: And one other question that I ask a lot of people, I'm kind of curious, how you at Superhuman do this, is, what's the makeup of the team that's working on this? So you know, how many people with machine learning expertise versus domain experts versus normal software engineers, you know? Like, what's the ideal makeup in your mind?

[00:34:02] Lorilyn McCue: Yeah, so we have a couple AI specific engineers. We have one backend heavy more infra engineer. We have two through three, I guess, not four, frontend engineers. Again, we have to have that because if it's seamlessly integrated into the product, then our designer is creating these like really rich, perfectly designed experiences. Obviously, I said a designer. We have a QA engineer, who, as I mentioned before, I think QA is an incredible addition to any AI team, and their role can go beyond just testing the feature inside of the product. Their role can expand to fine tuning. Their role can expand to creating the dataset. Their role can even be prompt engineering at sometimes. Yeah, I think the QA function within an AI team, is particularly interesting.

[00:34:58] Raza Habib: And maybe one that's different or bigger than in traditional software, where there's a deterministic output.

[00:35:04] Lorilyn McCue: Yeah, yeah. You can't like set up like an automated test, like, is the pixel there? Like, God knows where the pixel is going to be. You know, it might be over here might be over- I mean, obviously we're not dealing with pixels here, but, you know, the words aren't going to be the same every time. They will never be the same every time.

[00:35:18] Raza Habib: Relatedly. So we did team. Another question that was on my mind was about prioritization. So email is one of those places where I think we could probably sit down and come up with a ton of different possible AI features very quickly. And I'm sure like loads sound like good ideas, and some probably end up not actually being good ideas in practice. How did you guys choose where to start?

[00:35:38] Lorilyn McCue: Yeah, so it really gets back to my obsession with efficiency. I just want to figure out where to save you the most time. So we will literally look at how much time do you spend writing emails. How much time do you spend reading emails? How much time do you spend writing a subject line? How much time do you spend X, Y, Z, Okay, now let's take that and figure out where's the biggest optimization upside, and then we really prioritize based on time saved.

[00:36:06] Raza Habib: Okay, that makes sense. Yeah, very natural, like the metric that matters most.

[00:36:09] Lorilyn McCue: Natural, also very hard to measure, by the way. But yeah, it is. It is a really good one.

[00:36:14] Raza Habib: All right. So now I guess, like stepping back from Superhuman itself, for a little bit, I'd be also just curious about yourself in terms of how you manage learning and staying up to date with things in AI. Obviously, as I said, it's a new field for everyone. It's also very technical. It's also changing really quickly. How have you learned about it, and what do you do to stay on top of stuff?

[00:36:35] Lorilyn McCue: So I've actually found a lot of value in AI newsletters.

[00:36:38] Raza Habib: Do you want to shout out anyone in particular?

[00:36:41] Lorilyn McCue: Yeah, I'll give a shout to AI Secret. I think they're really useful. I found them particularly helpful. You know, Twitter is great but, or, sorry, X is great but, you know, there's a lot there, and you got to figure out, like, okay, who's, you know, what's worth checking out here? So I do appreciate a filtering of what's important, obviously, you know, I keep an eye on the company's own, you know what? What is OpenAI doing? What is Anthropic doing? What is, what are all the players doing? We also have a team channel. There's quite a bit of interest internally in AI, so I actually find that my team will find stuff that I haven't found. And so we have like an AI ideas channel, and we'll pop in there and say, "oh, did you guys see this? Check out this cool company." Or, "this is a new thing that I saw," or "I tried this today," and that's been pretty fun as well.

[00:37:24] Raza Habib: And in that process, Have you learned anything that you think is underappreciated, that you think more people should know about?

[00:37:32] Lorilyn McCue: Everything! I know you're going to ask me this because you've asked this to everybody, which is, I was gonna get ahead of you. Is AI over or under hyped?

[00:37:39] Raza Habib: All right, so for listeners who haven't heard me ask this a million times, the question that I always ask, and I do get a variety of answers on it, is you know, is AI over or under hyped, and I'm sensing Lorilyn, and you got a passionate answer to this one.

[00:37:52] Lorilyn McCue: I have a passionate answer to this. I cannot count on my fingers and toes the number of use cases that I would have never thought about people using AI for in the last month. Let me give you some examples. Example one, a friend of mine from grad school today that I talked to said, "Hey, have you heard about Notebook LM? I use it to make podcasts for my children." And I said, "Excuse me?" She's like, "Yeah, you could just put in something like a PDF of, like, I don't know my kids right now, really obsessed with World War Two. Okay? Create a podcast about World War Two." Fantastic. Keep them off their screens. Have them listening to something while we're on a car ride or while we're in a plane. Like, okay, I'm sorry, but would Notebook LM ever have thought about that use case?

Like, I know what I'm gonna do. Here it is example number two. Sorry, these are all kind of parenting examples, just because that is where I am living in my life right now, is in parent Landia. So my kid, again, obsessed with World War Two. I can't explain this, guys. I'm sorry, but he asked me, "Mom, I want a coloring page about World War Two." So I go into ChatGPT, and I'm like, "Okay, give me a kid appropriate coloring page of a World War Two battle, okay?" It outputs. He looks at it, he's like, "no, no, no, I need less tanks, more men." Okay, right? "Less tanks, more men, okay, more planes, more planes." Bam, suddenly, I have like this perfect picture for a six year old of a World War Two battle with less tanks, more men and less planes, and I'm able to print it out, and I have a kid that's like, happy, you know, like, I'm sorry. That's like, a crazy use case.

Third, this was great. I once remember that little feedback model that I told you about. So I was recognizing that our groups of like reasons why it wasn't working wasn't working wasn't sufficient. I wasn't able to get the data that I wanted from that. So I said, "Okay," I wrote in Slack, "let's take this away. Add this, remove this, change this wording," and I just listed it out. Put made a linear ticket. My engineer manager pasted this into Cursor, and within minutes, it was done, like, incredible, just from the ticket description itself. I mean, to me, that's like, I guess maybe I could have done that, you know, like, that's to me, that's really, that's really amazing.

Yeah. I mean, there's just use cases left and right, that people are encountering, people who are curious, people who want to have a little bit of adventure, and say, "Okay, I have this need. Can this new thing do it? Let me go into the unknown. Let me try it." There's so much out there that we can do with what's already there. We don't need to get much better to have like an infinite number of use cases. So I think under hyped, there we go.

[00:40:35] Raza Habib: I'm inclined to concur, and I also like that the way you answered that question was very concrete, right? I think when people talk about over hyped, under hyped, or whatever, it's possible to get very abstract, but there's just a lot of mundane utility. Like, increasingly, I find this, and I think others find that we're already just using the models more and more every day, and they are getting better. And so, yeah, I you know, as you've probably heard me say, with almost every guest like I'm also inclined to believe it's massively under hyped. And I think it's under hyped both because the reason I think it's under hyped is because I think people underestimate the rate of progress. But what I like about your answer is, like, even if there was no further progress, it's in some ways under hyped just because of the lack of imagination that people might have about how many possible use cases there are. And I think that's really exciting.

[00:41:21] Lorilyn McCue: If your end state is like AGI, yes, it's over hyped. We're not there yet. I don't know if we're gonna get there anytime soon. I love science fiction. Am I obsessed with AGI? Do I want Daniel robot to come into the world and save humanity? Yes, I do, but it's not that important to me right now. There's so many other things to do right now that that is what's interesting to me. Like there is an adventure to be had immediately, and I want to have it.

[00:41:51] Raza Habib: And what a lovely note to end on. So thank you so much for coming on, Lorilyn.

[00:41:55] Lorilyn McCue: Oh, thank you. This was a real pleasure.

[00:41:59] Raza Habib: All right, that's it for today's conversation on High Agency. I'm Raza Habib, and I hope you enjoyed our conversation. If you did enjoy the episode, please take a moment to rate and review us on your favorite podcast platform, like Spotify, Apple podcasts or wherever you listen and subscribe, it really helps us reach more AI builders like you. For extras, show notes and more episodes of High Agency, check out humanloop.com/podcast. If today's conversation sparked any new ideas or insights, I'd really love to hear from you. Your feedback means a lot and helps us create the content that matters most to you. Email me at raza@humanloop.com.

Principles for Building Excellent AI Features

Podcast:

About the author

Ready to build successful AI products?