PolyAI’s Nikola Mrkšić on making a voice assistant people love

Episode 14 September 16, 2022 00:32:45
PolyAI’s Nikola Mrkšić on making a voice assistant people love
The Georgian Impact Podcast | AI, ML & More
PolyAI’s Nikola Mrkšić on making a voice assistant people love

Sep 16 2022 | 00:32:45

/

Hosted By

Jon Prial

Show Notes

In this episode of the Georgian Impact Podcast, we talk to PolyAI CEO and co-founder Nikola Mrkšić. 


PolyAI is a conversational AI company spun out of University of Cambridge research that builds voice assistants at scale in multiple languages. You'll get a helpful breakdown of the challenges of building conversational AI, what makes PolyAI different, and you’ll hear a demo of the tech in action.

 

You’ll Hear About:

 

●  Nikola Mrkšić and PolyAI’s vision 

●  The accuracy needed to ensure customers don’t become frustrated 

●  How PolyAI has surpassed previous generations of voice tech

●  Mapping non-linear customer journeys

●  How PolyAI develops conversation depth with their clients

●  How outbound calls differ

●  The ways in which PolyAI differentiates itself

●  Having the voice match the brand

●  Where Nikola sees voice tech going in five years  

View Full Transcript

Episode Transcript

[00:00:04] Speaker A: Hi everyone, and welcome to the Impact podcast. I'm your host, John Prill. Today we're chatting with Polyai, a conversational AI company spun out of University of Cambridge research that builds voice assistants at scale in multiple languages. Now, maybe you're thinking voice assistance, voice response units, and you're maybe not thinking the most positive things, but let me tell you, things have changed. As CEO and co founder Nikolai Mirksich puts it, the technology has progressed a lot. In this podcast, you'll get a helpful breakdown of the challenges of building conversational AI, what makes Polyai different? And you'll get to hear a demo of the tech in action. First, let's hear Nicola introduce himself and the Polyai tech. [00:00:49] Speaker B: Hi, my name is Nicola Merczich. I am the CEO and one of the co founders of Polyai. We are a London based conversational AI company building voice assistants that provide superhuman customer service over the phone. They sound exactly like humans and they understand humans no matter how they speak, be it in different dialects or in different languages with background noise, or with kids screaming in the back of a car. We're really trying to give voice assistants a better name and to turn them from a technology that people really hate interacting with into something that they'll feel good when they encounter. The same way you feel great when you see a really good mobile app or a really smooth website that's had a really good customer journey built into it. And when you see it, you know that that enterprise, that brand, has your best interests at heart. And that's our vision, that's what we want to achieve. The company is a spin out from the University of Cambridge, where I did my phd together with my co founders. We all worked under a guy called Steve Young, who is one of the most successful speech recognition researchers of all time. Steve worked on building just pure speech recognition for a decade or two, and then at one point he realized that to build systems that we can have conversation with, we're going to need to do more than just speech recognition. We're actually going to have to build robust multi layer systems, which for the mistakes that speech recognition is always likely to make, because it's a moving target. Right when it starts working for us, a native speaker of English, then we want it to work for me, then it has to work for me from across half of a room on a speakerphone, and then with background noise, and then more difficult accents or very complex vocabulary. So much the same way that computer networking had to build a very complex multilayer system for communicating between two nodes on a very complex computer network. Communication between humans and machines is also a really hard problem that requires a lot of really complicated technological modules in between to make it a seamless experience where you end up enjoying that conversation. Personally, I'm Serbian, I was born in Belgrade, grew up there, came to the UK to study, almost ended up in investment banking. But last minute I met this group of guys at Cambridge who were starting a company called Vocal IQ. That was the previous spin out from the research group that I would later join, where I would meet my current co founders. So really, the group has a long legacy of spinning out advanced technologies from Cambridge into the real world. And, yeah, we spent a few years there. Apple acquired us to make Siri more conversational. That team is now one of the largest Apple offices outside of the United States. But after a few years, we left and started Polyai because we think the best use of this technology and the place where it's most needed is not actually the consumer voice assistant. Things like Siri or Google Assistant or Alexa. It's really in customer service for increasingly, you need more and more customer service and support for increasingly complex products. With our aging populations and low unemployment, people don't want to do these jobs anymore. And yet voice is the preferred channel. And through Covid, we had people who thought that everything would go to chat, but that really didn't happen, and we ended up having higher call volumes now than we did pre Covid. So at this point, we have no choice but to make these technologies work really well if we want to have really good customer service. [00:04:30] Speaker C: I don't know if it's accurate number or not, but I've heard that it took a while. Just on the speech recognition side, it had to be very accurate for people not to hang up. 95% accuracy on speech recognition wasn't good enough. People are going to hang up. You had to push it further and further. How do you feel like the accuracy of now, the responses have to be in terms of giving customers a good result and that they don't hang up or get frustrated. [00:04:59] Speaker B: It's complicated, right? Because the word error rate, so the percentage of words that are transcribed correctly is one metric to look at, right? And the first metric, because that's the first layer, are you turning voice into text accurately? And afterwards we have to look at, did we understand it right? And then did we provide the right answer? Now, the word error rate, you could cross 95%, which is the human level performance. But that's not the whole story, because which 5% of words you get wrong really matters, right? If you omit an article, it's really not a problem for understanding the essence. Whereas if you omit a key number or a date or an address name, then you're in real trouble because you're not going to be able to extract the meaning subsequently. So, yeah, you could push towards higher levels of performance. And I think that historically, many companies did just hope to get even better speech recognition that would then make the natural language understanding the bit that comes afterwards a lot easier. That's one way to attack the problem. The other one is to say, hey, if I have a separate module, almost like an autopilot module on top of speech recognition, then you have context, right? I know whether I've asked you for a date, for an address, for a number, for the name of a medicine you need, for a bunch of other things, right? And if I know what I asked you, then I can really do a lot to guess what it is that you said, even if the output of the speech recognizer is not perfect. So to look at the accuracy that's needed to facilitate pleasant human machine conversation, sure, if you had 100%, it would be just fine. But otherwise, what really matters is how cognizant are you of whether you are likely to have understood the user versus not. Because if you don't hear something and you ask very quickly and you just make it clear that, hey, I wasn't able to understand that, could you say it in a different way? That's not as frustrating as. I didn't catch that. Repeat that. Right. That's like showing very little contextual sensitivity to the fact that you are clearly having a sub perfect customer experience. Right. [00:07:22] Speaker C: I love that. I love your discussion of error rates and accuracy. And there's a tremendous, just a short video on your website where someone's speaking, and on the bottom of the screen you see that you're extracting what's relevant, making an appointment, the time or the date. And you're right, the articles don't matter. The key is extracting these contexts. And I hadn't really thought about the degree of interaction. But I guess one of my biggest frustrations with the old fashioned IVR was you have to listen to the menus and they've decided in advance what they want you to do. It's almost like a faceted way of working through the system, and I find that very frustrating. You've really gone a step beyond that, right? [00:08:05] Speaker B: Yeah. Typically, the first generation of this technology was pressed. One for credit cards, two for debit cards, three for mortgages, four for car loans. And you wait and you wait. And eventually, hopefully, the thing you called about is there, and you navigate your way to where you want it to go. Now, the next iteration of that was voice IVR, where you would say, hey, I'm calling about my mortgage. And that would hopefully automatically select that number four. But it was still very much a tree based system where you would have to just hope that what you say is mapped to one of the options. And if it's not, it actually led to much higher levels of frustration because you wouldn't know what to do. Right. You're there helpless, and what then happens is misrecognized options would actually get people routed to different contact centers. You're calling about a mortgage, but you end up talking about car insurance. And at that point, those are really expensive for the contact center and really bad for the customer experience. What we do right now is create fully fluid conversational systems where you express yourself freely, completely open endedly. You don't have to guess what kind of like, ivr tree is behind because it's not really a tree anymore. It becomes a highly dynamic graph that can't even be visualized in 2d because it skips to different parts of the conversation. It measures confidence intervals for the system so that we know whether we're likely to. We know whether there is doubt over what we think you want, and then we're able to say, hey, was that about mortgages or was it about multiparty insurance? [00:09:43] Speaker C: Right, but it could be mortgage payments, mortgage interest rates. You've got the ability to get the customer down the mortgage branches of the tree. If you still have some context that you don't know, they say xyz. You don't have Xyz. End up in the same boat where they just send them to a representative. So at the end of the day, you still need to know all the branches of the trees that you want, correct? [00:10:06] Speaker B: Correct. You need to know exactly what you need to do for each of the possible outcomes. So you do need to map out the customer journey, but you can create a substantially more nonlinear ones. So, say, if someone were to say, hey, I'm calling about renewing my mortgage, and I'd like the term length to be 20 years, and I want it to be a fixed tracker mortgage for five years, you could parcel that and just go, cool, well, look, the interest rate for that one would be that much great. Rather than, okay, mortgages, what is the term length? Okay, would you like a fixed or variable rate mortgage? Right? Because that's how those systems typically work. So now this just allows you to really just speak in a much less constrained way. [00:10:45] Speaker C: So in terms of RoI, I'd say one of the benefits I immediately hear is we'll get the customer to the right answer sooner. Where I feel like the current systems are there to keep you away from the customer reps for as long as possible. I feel like you open up. If you installed a fresh IVR, they would default to the message that says please listen to the menus option because they've changed, even if they haven't, because it makes people listen. Or please understand that due to Kai call volumes, you're going to be delayed. And I think they're lying to me. I just think they want me to wait on the phone longer. [00:11:22] Speaker B: I think there are definitely examples of that, right? I think that when you look at just implementing this technology, there are many ways to think about ROI, right? One is you are producing a much better customer experience and that can't always be quantified, but it leads to a substantially higher customer satisfaction, and in the end, it trickles down to revenue in many different ways. Number one is you never leave people waiting, right? So they're much less likely to churn as customers because they just feel like frustrate clients. If you're unable to help them with the voice system and the system is able to quickly detect that and pass them on to humans. You now have a really good hybrid contact center where, say, 50% of the queries are dealt with in an automated way, and the other half now go to a contact center that's no longer way oversubscribed. So you're helping your contact center by making their lives a lot more livable. And there is a lot less churn on the agent side there. So they stay for longer, they become better trained, and again, your customer experience improves. But then there are other things that are more industry specific. So, for instance, we work with a number of hospitality companies, hotels like Marriott, or large restaurant groups in the UK, like with bread and Green King, we help them never miss a call. Now, these are establishments that take calls in a distributed way. It's not a contact center is the reception of a hotel or the front desk staff of a restaurant. And if they pick up every call and take the reservation, answer a question. Compared to the current state where labor shortages are hitting everyone, especially the hospitality industry, both sides of the pond, they tend to miss between 30 and 50% of the calls. If you're missing that many, some people will call again. Some people might walk in some will go online and do it in a digital way. That's what people are hoping for. But the data we have suggests that anywhere between three to 5% of the people will just not show. And that's it. You've lost three to 5% of your revenue. So when we started working with these companies, we thought we were selling convenience and just faster, better customer experience. But in the end, it ended up being massive revenue generation. And to this date, hospitality remains our largest vertical because we're just able to do things for them that are hugely valuable from both customer experience, but also from the top line perspective. [00:13:54] Speaker C: It sounds like it's critical as you think about kind of the corpus, the trees that you have to define. And I'm really impressed, again, going back to that thing on the website and someone was making a restaurant reservation and said, I have children. The response was, do you need a high chair? Obviously somebody with industry knowledge puts that there. How do you work with your customers to get that level of depth and insights? And obviously it helps knowing the vertical you're in, that it's a restaurant reservation versus a hotel reservation versus just a general query. And I want to come back to general query, but let's talk about that vertical case and how do you develop that level of depth? [00:14:31] Speaker B: So for all the industries where we have sizable deployments, and that would be hospitality financial services, telco, government banking is huge. We have a number of templates for different clients where they can start building their own thing and customizing it. And then with some clients, our solutions consulting teams would go in, understand their contact center and propose the full design of the agent. In other cases, the clients themselves are able to use those templates to customize their solution. And in some cases it's a partner like an SI, delivering those solutions to large clients. And these are companies that know their processes really well, that are basically part of their it team. So we're pretty good at mapping it or teaching our partners how to map those processes to build a really good voice assistant. But there's no shortcut. Someone somewhere is going to have to understand exactly what a good customer experience should be before we're able to turn it into an incredible human machine conversation. [00:15:27] Speaker C: So that's fine. And obviously that makes a lot of sense. And of course, things evolve. Let me talk a little bit about learning. There's a particular car company that has implemented voice commands, and they don't tell you what commands are there. They say you can do things such as, and the thing would be change the driver's temperature to 68 degrees. They didn't say, you could say, I'm hot, and it lowers the temperature, raises the temperature, or lowers it by three degrees. And it turns out a bunch of users have a shared Google Doc for the entire world to see every command, because they don't want to give you the answer. They want to learn from your question. So is there a learning model as well? [00:16:09] Speaker B: Yeah, of course. I think that for us, it's really important that we create systems that people speak to in a very natural way, the way that they would just want to speak naturally, because I think that this technology has been held back by systems where you were taught that you have to speak in a specific way, and that immediately just changes your overall satisfaction with that interaction. So we try to build systems that you have to interact with exactly in the way that you would want to interact as a human being speaking maybe to another human, or just the way that it's very comfortable for you. So we try to elicit a response that's not a pre programmed response, where you have to know how to speak to the system, because that leads to immediately, much lower satisfaction with that. You know, say you were calling me, I would create a system that says, nicola Pollyai. How can I help if I pick up the phone that way? Well, then you're very likely to just speak to me as if I were a human being. And if I understand, then we're building trust. And this trust led model is really important to us, because if you trust that I, as a system, will hand off when I can't do a good job, and if otherwise, you trust me to understand exactly what you're saying when you speak freely, then if I misunderstand you in a sentence and ask you to repeat, there won't be much love lost. You'll probably just repeat, and it'll be okay. And we have thousands and thousands and actually millions of examples of calls that go that way. [00:17:32] Speaker C: Interesting. The system sounds so good. Do you feel like you have to identify yourself and say, hi, I'm your artificial assistant, or however you call yourself? [00:17:43] Speaker B: That's one of the favorite debating points inside the company. We used to think that we should say it at the start, and then we ran some experiments, and it turns out that if you don't actually lead with that, with that disclaimer, you get people who interact with you a lot more freely. And then if you actually understand, you build that trust, and you're much more likely to get to a successful resolution of the call. If you say that you're an automated agent. A lot of people have had very bad experiences with this technology in the past, and if you evoke all of those experiences at the start, you're just putting yourself on the back foot. So ethically, if at any point we detect confusion, we will add passing. Hey, I'm an automated agent that can help you, depending on the use case, help you amend your reservation. Right? [00:18:35] Speaker C: That's great. [00:18:36] Speaker B: Or that can help you with your bank account management. But if people ask, of course we come clean. We don't hide the fact that we are an automated agent, but we've seen no upside to deliberately sounding robotic or having long verbal disclaimers at the start. Fascinating. [00:18:52] Speaker C: I remember, it's historical and I get it. We live with these histories of IVR. But I remember when I think it was American Airlines changed their system and put ums in, it was clearly an automated agent. But just hearing the ums let me look up that up for you, made me feel better. But you're obviously leaps and bounds beyond that, for sure. [00:19:11] Speaker B: But, yeah, I think that design aspect is something that a lot of leading teams that had high accuracies didn't really understand. Right. And then those that understood design maybe had subpar systems without that weren't accurate and powerful enough to express complex, nonlinear conversations. So we hope to marry those two together into things that will finally elevate the general feeling towards these systems. [00:19:35] Speaker C: Nice. In all cases, you're reactive in terms of someone calls in and the phone gets picked up. I'm contrasting that in my head to a kind of a proactive call, and I think it was Google duplex, making haircuts appointments. That kind of got people all in a bit of a tizzy. Your design is always to stay active. [00:19:56] Speaker B: No, we do outbound as well. Outbound is actually a lot easier because really, the agent dictates the conversation. Right. If I call you and my whole thing is like, hey, John, you owe some money and you know which money you owe, right? It's for your car and I'm calling you from that and that bank. Right. Then it's, well, okay, like, are we going to talk about it now? Are you going to give me your payment details or are we going to talk about it at a different time or are you unable to pay? And then we go with one of those and we're done. Right? Assuming you picked up the phone, which you probably didn't. So in that case, it's actually a much easier thing to build for because it's much more directed and less complex than a free form conversation, where, because it's kind of like chess, it's the opening move. If I say, how can I help? The board is yours to play. If I start with a particular move, your set of moves is a lot more constrained fast, so you can do a lot less. Love it. [00:20:50] Speaker C: How about idioms? Obviously, you've been deployed for many years. How do you pick up and learn? And constantly learn? There are things that people say today that I didn't know a year ago because phrases change and idioms change. So how do you pick up? We'll stay with one language and then we'll go to multilingual next. But how do you learn and focus on idioms? Is there a research team or are you learning from the interactions? How does that grow and evolve for you? [00:21:12] Speaker B: We've been working on NLU natural language understanding for a long time, especially in the context of dialogue. We were lucky to be among the first to work on deep learning based approaches to doing all the different parts of a dialogue system, which meant that we got the chance to work on pretraining. So kind of like ways to consume large bodies of text to get models with very high levels of performance without having specific training examples for the kind of conversation that we're trying to implement, but rather for just all conversations. So for idioms, for very complex expressions, for synonyms, stuff like that, we've seen a lot of text already. Our models are pre trained on millions of conversations that we get from Twitter, Quora, Reddit, from all sources you can imagine and find online freely. Right? So they've seen a lot of text, and then hence they're pretty robust around knowing which idiom is likely to be a good signal or a particular intent. And that's the second piece of it. Right. When you're trying to understand, let's say, one of 200 things that you can say in a banking conversation, there are only so many idioms that will cause confusion between whether it's intent number 47 or intent number 55. [00:22:22] Speaker C: Right. [00:22:23] Speaker B: So it's a less difficult problem than the abstract problem of overall deep language understanding, where the academic community as a whole has not gone really far. But when you constrain the problem, you get to do a lot better. [00:22:38] Speaker C: That makes a lot of sense, again, because we're talking about use cases at focus. So I love it now. So let's talk about multilingual lingual. How many languages do you support? [00:22:47] Speaker B: Yeah, at the moment we do just over 50. [00:22:52] Speaker C: Don't say just 50 is a lot. [00:22:54] Speaker B: Only 50? Yeah. Well, when you're a startup and an enterprise software company. The sad thing is, while we'd love to work on getting coverage for over 200, and we have a team uniquely qualified to understand that slavic languages have difficult morphology and some african languages have difficult phonetics, and every language has its own challenges, the truth is, it's also kind of like intersected with the number of countries where there are large enterprises that can benefit from this solution. So 50 is, at this point, more than what we need. Really? Really. But it's a long standing passion of ours. We have one model running across different languages and across all of our different use cases, hence the name Polyai, because the system is a polymath and a polyglot. And we worked on this stuff for a long time, training things that would train, say, in English with a bit of Italian and German peppered in, and then they would get really high performance overall, three and better performance, say, in English, than they would if they had english data alone. So these kind of like inherently multilingual models are something that we really care about and look forward to bringing to life. The other interesting thing to say about that is, especially for north american audiences, is there's been a lot of voice IVR in the US, a lot more than even in the UK, and way more than in continental Europe. The reason for that is there are bigger companies for whom the previous generation of technology, while it didn't have great customer experience implications, it had good economic ROI. Opex reductions, reductions in contact center size, because the IVR could handle 10%, 20% of interactions. These are companies like nuance, ipsoft interactions. The previous generation, they've managed to do quite a bit in that they've kind of like, poisoned the pool. People in America and in Canada now really hate these. You know, Brits are more open minded, and when it comes to continental Europe, they've barely experienced them. And that it offers great potential because it's frequent, that if a particular geography misses one generation of technology and then the next one comes in, you see much faster adoption than if you had the previous technology. So, for example, fiber Internet is a lot better in eastern Europe than it is in western Europe. And that's just because I think that adsl connectivity and stuff like that was a bit weaker in eastern Europe than it was in western Europe. So you have these cycles, and I think the hope is that we can do a lot more in Europe now with more open audiences and consumers. [00:25:35] Speaker C: It's the same thing of lower developed countries bypass running copper wires right to wireless. So I hadn't thought about that as a huge opportunity. That's great. So something I've never done in my. I don't know how many years I've been doing this podcast. [00:25:48] Speaker B: Probably close to ten years. [00:25:49] Speaker A: We haven't had a product demo on the pod before, but we're going to let you listen to a demo of poly AI in action. [00:25:55] Speaker C: Here it is. [00:25:58] Speaker D: Cafe Bridge, Portsmouth. How can I help you today? [00:26:01] Speaker E: Hi there. Can I book a table for one? And coming Sunday for four people, let. [00:26:08] Speaker D: Me just have a look at what space we have. Okay. We have a table on the 24 July at 01:30 p.m. For four people. Is that all okay for you? [00:26:19] Speaker E: Yeah, that's great. [00:26:20] Speaker D: Great. Shall I go ahead and book that in for you? [00:26:24] Speaker E: Please do. [00:26:25] Speaker D: Can I have your full name, please? [00:26:27] Speaker E: Yes, my full name is Emily Schaefer. [00:26:31] Speaker D: Can you spell your last name for me? [00:26:34] Speaker E: Sure thing. It's S-C-H-A-E-F-F-E-R. Schaefer. [00:26:42] Speaker D: Can we reach you on this number? [00:26:44] Speaker E: Yes, you can. [00:26:46] Speaker D: Perfect. I'm just waiting to see that go through and that's done. Is there anything else that you need to let me know? Any allergies or other notes that I should make on your booking? [00:26:59] Speaker E: Yeah, someone in the group is celiac. Can you make a note of that, please? [00:27:04] Speaker D: We cater the dietary requirements, but all our food is prepared in the same kitchen. I'm just making a note on your booking now, but please remind someone from our team when you get here. [00:27:15] Speaker E: Yes, I can do that. [00:27:17] Speaker D: Is there anything else I can help you with? [00:27:20] Speaker E: I think that's everything. Thank you. [00:27:23] Speaker D: Enjoy the rest of your day. [00:27:24] Speaker C: Bye. [00:27:26] Speaker E: Thanks. Bye bye. [00:27:27] Speaker C: So let me ask a little bit about differentiation and where do you think. Tell me about your team and where your kind of differentiation comes from. [00:27:36] Speaker B: Both. [00:27:36] Speaker C: Obviously, the people are coming out of Cambridge, you've got really talent, but how do you feel like you differentiate your product? [00:27:42] Speaker B: There are really three main pillars of differentiation. The first one is while we do both voice and chat, we are an omnichannel solution when it comes to voice. We are really committed to building incredible voice experiences. So we have these spoken language understanding modules that allow us to reach incredibly high levels of performance over voice channels over the phone, which is quite a lossy channel with different callers interacting with it in different ways. That's one piece, and we talked about that quite a bit already. The second piece is when you start building up an application for a new domain. You typically, for other solutions, need thousands of examples to get to a respectable level of performance. When training that algorithm. We're able to do it with one to two orders of magnitude less data because we have such powerful pretrained models. [00:28:36] Speaker C: Tremendous. [00:28:37] Speaker B: Another advantage of that is that, well, I mean, this goes in different ways. We're cheaper, and especially on the kind of like, implementation side, we charge a fraction of what others would because we can just do it really fast. The second one is that we can do it really fast. It's not a six month project. We've deployed systems in a few weeks, in two, three weeks with quite a few clients, and that's not something that's very typical to this technology. The third bit is we don't need a huge amount of your data which may contain personal, identifiable information. So from a compliance perspective, getting going with us is a lot easier and has less administrative hurdles than other vendors do. There's design. Right. As you'll hear in that demo, we make sure that our things sound really human to elicit that engagement. Then the final bit is that it's actually a platform. It's not really a managed services solution in the same sense the previous generation of this technology were. It's something that we provide a template. Customers then use it to create their own solution, to finalize it, and then they're able to really have a much lower total cost of ownership because they don't get an invoice from Polyai every time they need a bit of a revamp or a change. Got it. Specific part of the system. [00:29:46] Speaker C: So let's end with a couple of fun questions. Question number one. What about the voices that people expect to hear? I'm not sure I'm comfortable. I think Siri allows you to put your friend's voice on it. What's your expectation, what people want to hear? [00:30:03] Speaker B: Oh, we are really big on getting the branding right. So say if it's a taxing company, we'll have a Texas voice. If it's a scottish company, we'll have Scotsman or Scotswoman speaking. We really work hard to capture the brand so that the vocabulary, the mannerisms, the speed, the humor reflect the brand. And that's really big. I say, next to all of our technical advantages, this is one of the main reasons that our clients like working with us. [00:30:31] Speaker C: Great. Nicole, I'd really like to hear your thoughts. Where do you think we'll be in five years with this technology? Kind of predict what would be your vision of where things might look? X years, you could pick the year, but x years out, what do you think things will look like? [00:30:46] Speaker B: So look, voices is the dominant channel and I think it will remain to be quite a dominant channel. But really what we hope to achieve is just turn the world into a place where if you hear a voice assistant, maybe because it has a different dial tone, you're just like, oh great. The same way you feel like when you see a good website and it's like, oh great, they have a nice website, it's easy for me to do what I want to do. That's the hope for five years, right? I think that the consumer voice assistants are slowly finding product market fit, but they've not really become an essential part of many people's lives. I mean, I have them in every room, but I'm hardly representative. I'm really biased towards this technology, but I think it's all progressing and it's kind of like laptops, right? They snuck up on us, right? We knew about them for ten years and then one day we all had laptops and very few of us had workstations. So I think in the same sense, voice assistants will just find different applications and AR and VR may be a big part of enabling that. So I think that we'll just end up living in a world where we use them for all sorts of things seamlessly in our lives, in our car, at home, while walking around with your smartwatch, with your AR headset. It's just going to become a lot easier to do things that will start working really well. So that barrier of you trying, failing and then getting put off, we've got to go through a few iterations of that to get to a point where it is part of our lives because it is really easy to express. We are conditioned to express ourselves with our voices and there's a reason we're speaking now instead of doing this over email. Right. [00:32:14] Speaker C: That is a great way to wrap this up. I love where things are today and I'm looking forward to where things are going to go tomorrow. Nicola, thank you so much. Paulie, it's a great story and thanks for spending the time with us today. [00:32:26] Speaker B: Absolutely. Thank you for having me. It was a.

Other Episodes

Episode 99

November 25, 2019 00:24:16
Episode Cover

Episode 99: Should You Be Worried About Your AI Liability?

The use of AI raises a lot of new questions about the use of personal data. For companies, this means being more thoughtful about...

Listen

Episode 1

January 30, 2021 00:20:02
Episode Cover

Getting Conversational AI Right at Microsoft

When AI needs a personality, you need a team of creatives to work hand-in-with hand the technical folks. Deborah Harrison, our guest on this...

Listen

Episode 57

November 25, 2019 00:24:47
Episode Cover

Episode 57: Welcome to the World of Conversational Marketing

Conversational marketing is an emerging field that’s focused on using chatbots to find better ways to engage with your customers and manage your brand....

Listen