We can unlock valuable data from meetings and conversations

0

Sam Liang, chief executive and co-founder of artificial intelligence transcription start-up Otter.ai, has a plan to save us all from endless, boring meetings. His company is working on personalised AI avatars that will one day be able to attend online meetings on their owner’s behalf. 

Founded in 2016 and based in Mountain View, California, Otter.ai has evolved from a simple voice-to-text transcription service to offer automatic recordings of live events, meeting summaries and content searches. Liang says he envisages Otter as a productivity tool that can improve attention and save everyone time. It built its speech recognition and summary service in-house and uses third-party large language model partners to provide an AI chatbot.

AI Exchange

This spin-off from our popular Tech Exchange series of dialogues will examine the benefits, risks and ethics of using artificial intelligence, by talking to those at the centre of its development

The start-up, which last raised $50mn in 2021, claims that it is approaching 20mn users but has not provided information on how many pay for its service. In 2022, it imposed new limits on free users, offering a maximum of 600 minutes of transcription per month. Paying customers receive far more. Competition in the sector is growing. Big Tech companies, such as Google, offer their own audio transcription services. Google is also working on a project to create avatars in video conferencing.

Liang was born in China and moved to the US in 1991. He received a PhD from Stanford University before joining Google, where he led the search giant’s location services. His first start-up was acquired by Chinese ecommerce company, Alibaba. 

In this conversation with Elaine Moore, the FT’s tech comment editor, Liang describes access to audio data as a new way to break down the silos in any business. 


Elaine Moore: Can we start by talking about your plan to create AI avatars for meetings? How’s that going to work? What kind of data will be required and does it mean that eventually we won’t be required for meetings at all?

Sam Liang: The first step is to collect a large quantity of data from the user. The data can come from many different forms . . . the most important is meeting data. 

[Take] the meetings I have had in the last seven years. I talked to venture capitalists; I talked to customers; I obviously have tons of internal meetings with our own teams: sales team, marketing team, recruiting team, engineering team. So that’s a huge amount of data we can use. We want to use some other data as well. For me, we could share Google documents I wrote, or other memos, some of the emails, some of the Slack messages.

The more you learn about the user, the better the avatar can be. Then, we inject all this into the training system and build a model that emulates them. 

Of course, we need to test this and evaluate this system, so we have asked our colleagues to test drive the avatar. They may ask it a number of questions, or we just send the avatar to a regular meeting and see how it performs. We have our prototype we’re testing. It is far from perfect, so there’s still a long way to go. But it’s very promising,

EM: Is the idea that avatars will be able to speak as well as record what’s going on?

SL: Oh, yeah, absolutely, absolutely. The simplest form of meeting is a one-on-one meeting. So we can start with that. Another one we’re working on is what we call a sales agent. We train a sales agent that can talk to a customer, and explain the product, and answer customers’ questions. That’s another form. An avatar tries to emulate a specific person, but an agent can either emulate a person or use the knowledge of multiple people collectively.

Sam Liang, chief executive of Otter.ai, at May’s GenAI summit in San Francisco © Paul Morris/Bloomberg

EM: You’ve said in the past you could envisage a world in which somebody is recording their entire day-to-day existence on Otter. Were you serious about that?  

SL: Longer term, it is a goal. Short term, we’re focusing on business and meetings. But we see that [a] valuable conversation can happen at any time: it can happen in a hallway when you meet someone; it can happen at Starbucks. 

I find that a lot of valuable [data in] conversations are missed. I’d love to have Otter be present at any time and capture everything. So, although, again, we’re focusing on the business use cases, this can be used in personal life as well. 

Actually, I’m using Otter when I’m having a conversation with my sons. We are empty nesters: one of my boys is in college; the other one is working in New York City. It’s really hard to get a hold of them now. I have to beg them to have a call with me! So, whenever I have a call, I see that as very precious and I use Otter to capture it.

EM: Are you using Otter as a memory device to help you search for things said in past meetings? Or for something else?  

SL: It’s mostly memory. We created Otter AI chat. So, I can use Otter AI chat to query all my past meetings. Actually, you and I had a conversation on August 15 and, in order to prepare for this meeting, I reviewed our call to refresh my memory. 

That was a meeting I was part of. But, in our company, there are hundreds of meetings every week. Obviously, I cannot go to every one, but there’s a lot of information that’s valuable that I would love to have. So I use Otter AI chat to query our company meeting database. 

One good example is the calls our sales team have with our customers. I query the sales meetings every week to understand better what our customers are looking for, what their pain points are, what their problems are, and what their workflows are.  

EM: On the subject of being able to see notes from meetings you didn’t take part in, there was a report that one Otter user accidentally received a transcript of a conversation that took place after he’d left a meeting. How do you think about user data security and privacy?

SL: We definitely take security very seriously. We totally understand that voice conversations are extremely sensitive and security is paramount, so we provide a lot of measures to protect user privacy. All the data is encrypted and we have a strict access control system. This system is actually not much different to Google Docs: the user controls who has access. If you accidentally share it with people you didn’t want to share it with, you can always remove their access. And there are different levels.

The incident you discussed, I wouldn’t say it’s AI specific. It’s actually a hot mic situation that can happen to anyone. In this particular situation, as far as we know, after the meeting had finished some participants dropped off but other participants continued to talk without being aware that [the meeting] was still being captured on Otter and the notes were being shared with all the participants. So that’s how it happened. 

In the sharing mechanism, we warn the user in advance that ‘Hey, this note is being shared. So only talk about things you’re willing to share.’ 

Otter AI transcript of a meeting © Otter AII

We’ll definitely improve the product to make it more prominent and more intuitive. But the user does need to take some responsibility to use the tool correctly.

EM: You worked at Google in the early 2000s and I read that you were the designer of the blue dot that shows where we are on Google Maps. Is that where you got the idea to create a company that can organise recorded information? Because, in Google Search, it’s still quite hard to search for information in audio or video clips. 

SL: I worked on Google Maps and location platform for four years between 2006 and 2010. I left Google in 2010 to build a start-up in Palo Alto that would track mobile location and then analyse the data to provide personalised mobile services. After we sold that company, I realised that voice data is very similar — in the sense that the majority of voice data has never been captured. 

I forget a lot of things and it’s really hard to search and recall information that has been heard. So we decided to work on this problem, to collect as much audio data as possible, and help people to solve their memory problem. 

It is a sharing problem. If you think about enterprise, so many meetings are happening in each department but most of the meetings are not shared with people in other departments. So that creates a lot of information silos that make the enterprise less efficient and less productive.

EM: Otter was founded in 2016. What’s the fundraising environment like right now? How does it compare to a few years ago? 

SL: We raised our last run in 2021 . . . it’s been more than three and a half years. We have been super efficient. Because our users grow organically, we didn’t need to spend too much money acquiring more. And revenue is growing very rapidly. 

So we haven’t had the urgency to raise a new round. But we are seeing that the venture capital community is getting more active now — especially after the Federal Reserve cut the interest rate. I see the sentiment is much more enthusiastic. You saw that with OpenAI doing a new round valuing them at more than $150bn. 

There are a lot of other start-ups getting a lot of new funding. Many of them are really good AI companies. But the market is a little bit frothy at this moment. 

It somewhat resembles the internet bubble era. Many of these companies will die and only those that have core AI technologies, that build a unique business model, can survive. Many young start-ups don’t have their own core AI technologies. They just call some third-party APIs [application programming interfaces] and build a very thin wrapper over. Unless they build some strong user or data model, they can easily be replicated by other companies. 

We build our own speech recognition technology. We build a lot of proprietary AI technologies. And, you know, we have processed over a billion meetings, so we have a tremendous amount of meeting data that can help finetune and enhance the AI models we build.

So we have built an AI flywheel that we can leverage to continue to grow rapidly. For AI start-ups to survive or thrive, they have to build their own AI system, and they have to have huge amounts of data they can leverage. 

EM: Are you concerned about the competition? 

SL: There are already a lot of competitors. Obviously, we see competition from two directions. One is large tech from Microsoft, Zoom, Google and others. They control the video conferencing system. However, against them, we have a lot of advantages. We are much more nimble. We’re much more agile . . . we are platform agnostic. We not only support one video conference [platform], we support all of them. And we also have a really strong mobile app that people use for in-person meetings. None of the Big Tech [companies] actually focus on mobile, in-person meetings. 

And the other direction is, of course, there are a lot of other small start-ups. There are at least a dozen meeting assistant start-ups out there. But none of them is as big as us. [We] have a much bigger user base and a much bigger data set than all the other start-ups.

Of course, new start-ups are being born every day. We are watching the market and seeing what other start-ups are doing. We just have to move super fast.

EM: How do you think you can preserve your niche? 

SL: Our price is very competitive. But that’s not the most important [thing]. The most important is product quality, product features, and the user experience.

[Take] Google as an example — they have an infinite amount of cash. They have 100 times more of certain types of engineers than us. But, if you look at Google in the last few years, there’s no new interesting product coming out. They just [don’t have] the right product mindset. So this is why we’re not afraid of large tech. Our product is much more user friendly . . . the AI chat we’re providing allows you to query all the meetings in your system. We still haven’t seen that from Google, Microsoft or Zoom, so we are way ahead of them already. 

In terms of pricing . . . many other start-ups who don’t own their own AI model, [and] who have to call third-party APIs to do speech recognition and other AI algorithms, have to pay a much higher price to use that API. That really hurts their profit margin. So for us, we do have an advantage because we own a lot of models ourselves, and can keep our price low.

4hrsAverage time saving per week claimed by users of Otter.ai

EM: Are you focused on enterprise customers right now? Or is the focus on expanding the total number of users?

SL: We support both. We have our freemium model that allows individual users to use Otter on their own. Most of these users are professional workers. And we leverage this huge user base to get into enterprises. This bottom up system is very similar to other successful SaaS [software-as-a-service] companies, like Dropbox or Slack. They have a lot of organic users who penetrated large enterprises. Then, later, they use that user base to aggregate them and create enterprise contracts. 

EM: You had a very rapid increase in users during the pandemic. Has the pace of growth slowed since then? 

SL: It continued to grow rapidly, especially this year. Actually, it’s a little bit slow in the summer, when people are taking vacations. But it seems late August, September, so far, we’ve seen record growth. It’s both user growth and revenue growth. So, awareness of AI and overall AI adoption is getting stronger and stronger. More people are realising AI can really help them.

EM: Finally, what do you say to potential business customers who might be concerned about hallucinations or accuracy when it comes to using a transcription AI service for sensitive meetings? 

SL: We can build our model and manage our model parameters to minimise hallucinations. It happens less and less often now. Of course, people do need to double-check important numbers and important facts themselves. But the pros definitely outweigh the cons.

We recently did a survey of more than 600 professional users of Otter. They say they save four hours every week, on average. So people can use those four hours to relax and maybe have more family time. Or to do a lot more work. I think that’s more valuable and, maybe, they can tolerate a little bit of hallucination. 

This transcript has been edited for brevity and clarity

#unlock #valuable #data #meetings #conversations

Leave a Reply

Your email address will not be published. Required fields are marked *