An introduction to generative AI with NVIDIA’s Mahan Salehi

[00:00:04] Speaker A: Today, we have a treat for you. We're going to be talking about generative AI, but we are not jumping into that hype cycle, although it's very cool. And I love reading all the new articles that show up every day about how and where generative AI is used, what jobs are lost, what jobs are creative on and on. Now, we may get there in this podcast, but first we're going to step back and we're going to put this into effective. This is for you. And today, my guest Mahan Salehi and I are going to talk through some history and properly introduce and talk about generative AI. Content matters, but so does context. Now, Mahan is an AI and LLM product manager at Nvidia, and hopefully, because I think I know my audience, LLM stands for large language models, and it's basically the base data that Generve AI is built on. And many of you might have heard the term Chat GPT. That's one example of that. [00:00:53] Speaker B: So let's get started. [00:00:54] Speaker A: I'm John Pryle, and welcome to the Impact podcast. [00:01:02] Speaker C: The material and information presented in this podcast is for discussion and general informational purposes only, and is not intended to be and should not be construed as legal, business, tax, investment advice, or other professional advice. The material and information does not constitute a recommendation, offer, solicitation, or invitation for the sale of any securities, financial instruments, investments, or other services, including any securities of any investment fund or other entity managed or advised directly or indirectly by Georgian or any of its affiliates. The views and opinions expressed by any guest are their own views and does not reflect the opinions of Georgian. [00:01:38] Speaker A: Mahan, I'm glad to be speaking with you today. Look, a number of years ago, before every car manufacturer has delivered some type of self driving technology, or maybe they're just talking about it. We had a guest speaker from Nvidia at a georgian conference, and the topic was machine learning and self driving cars. And what blew me away was this video of a car driving along something that I don't believe I could call a road. It was through the woods on a dirt path. It was going around trees. It was really just astounding. So, to step back, semantic matters. So I call what I saw in that video machine learning because in my mind, it was just using the data it had, and it was basically trying to determine what the road was and where the road was going. Do you think I'm right, or is that something we should have called artificial intelligence even then? [00:02:27] Speaker B: That's a great question. I think that when we talk about deep learning versus AI and machine learning, the terminology used can be very confusing. What's a subset of the other? I think that artificial intelligence, you hear a lot of folks talk about it as a system that could generate a response that's very human like. So just as good as a human or close to being as good of as a human being. Now, if you want to dive into the technical details of that, you have different techniques of doing this. Machine learning algorithms and deep learning algorithms. And the innovation that happened over the last couple of years that we saw, and this is what led to things like self driving being a possibility, is that we went from very easy, rules based models that were simple to use, mostly linear problems or linear functions that couldn't generalize well on data, that couldn't do tasks that it wasn't really trained to do, to then being able to do deep learning models, and machine learning models that were able to extrapolate on data that they might have not even seen before. And so, in the case of self driving, you can teach a model to recognize images of the road and force and things around it. And if it sees enough data over time, then it get really, really good at recognizing these things, even if it hasn't seen that same exact type of scenario or environment around it. And that's something that we hadn't seen two decades ago or even three decades ago with more traditional models, where they were focused on one specific application, and they couldn't understand things that they never saw before. [00:03:53] Speaker A: So when Netflix recommends a movie, that's simple, that's really just machine learning. It's not going really too far. [00:04:00] Speaker B: It's years. [00:04:01] Speaker A: But I love that you could feed images of cats and dogs to the model, and it could parse them. And taking that to self driving, only machine learning could take an image and figure out that it's a tree on the side of the road versus a person. We couldn't program for that. But you're saying, I take this to the next level, and I've got deep learning. Now, it could begin to extrapolate from that. Now you still have to give it guidance as to what to do. Correct? [00:04:27] Speaker B: Absolutely. And I think the key things to understand about deep learning models is that a lot of the problems that we see in the real world are really complex. They're nonlinear relationships. And so deep learning models, the interesting thing about them, and you hear about neural networks being thrown around, buzwords like that, basically, the design of how they're created, you could think of it almost analogous to a human brain, where you have neurons and synapses and the connections between these neurons. And the larger these models get, the more parameters or neurons you add to these models, the better they are at being able to look at a bunch of data, build an internal understanding of what they're looking at, and then be able to learn from that and generate the responses that you're looking for. Recognizing images is not something that, although it's simple to us now, back then, 2030 years ago, was not a very simple task, because as soon as you start to mess around and you show a picture of a forest and it has different types of trees that we haven't seen before, there's clouds in the background that can mess everything up. And the whole point of deep learning models is that they get so intelligent that they can see the treaties through the forest. If they see enough images, they're able to really generalize well and understand what they're looking at. [00:05:36] Speaker A: So one narrow question, and then we're going to go to stay on this general piece. But I've been totally fascinated when Google did Alphago, and it beat the best go player in the world. And my assumption is it just gave it, like, a simple set of rules where to put the stones and how to play go, and that's all it needed to know. And then it played itself, I don't know, 40 bazillion times. But then when it beat that humid, it created a strategy that no one had ever seen before. I'm shocked by that. But I assume because it's narrow and there were some rules and it was fenced, and they were able to just kind of figure this out along the way. And in my sense, and I may be completely wrong here, this is like aiv two. So how would you categorize those things? [00:06:20] Speaker B: I think that especially, and we talk about this in large language models, NLP, or natural language processing, is in a gray area to talk about this, because when we went from, again, rules based models that could do very, very simple, rudimentary things to then deep learning models that could learn from a lot of data and do some of the things that you're talking about, which is come up with new and innovative ways of solving problems that we might have not trained it to do. But there's still some limitations, I think, when we talk about large language models, understanding the languages of how human beings talk and the patterns that we talk in is a very complicated thing. So you can tell a model, a deep learning model five years ago to translate something from English to French, and it could come up with a way of doing that that would be maybe different from how a human being would translate words. So we might go one by one, word by word, and translate, whereas an AI model might look at the whole sentence and the words around it and say, okay, what does this word mean in relation to another? What's new and interesting about, or I guess, v two AI is this generative capability, because we're going from a place where we're understanding language and understanding data to now being able to not just understand it and do a better job of understanding it, but generating it. And so those generative capabilities are something that we haven't seen before, and it's going to unlock a bunch of new use cases. [00:07:33] Speaker A: That's great. Let's go through the two pieces. My view of Gen, of AI, so I do have text, and people think of Chat GPT, an image stable diffusion, or dolly, help me understand, kind of restate again this gender statement, and talk to me like I'm twelve years old. [00:07:54] Speaker B: Absolutely. So that's a great point. Generative AI, when we talk about large language models, is focused on language. But generative AI applies to many different things. We can generate text, we can generate music, we can generate art, images. The foundation of pretty much all these models is a specific type of model called a transformer. It was talked about in a paper by Google and University of Toronto researchers in 2017, which is really the foundation of everything that we're talking about here today. And you can think of it as a new type of algorithm or a new type of model architecture that's really good at being able to understand, especially language related tasks, but also any type of applications where you have sequence data, whether that's images or audio or music or anything else, and being able to not just understand what it's looking at, but generate something out of that. And so when we talk about language models, we have things like GPT-3 we have Chat GPT. These things are really powerful because they're very large, these models, and so the larger they get, in some cases, the better they are at being able to do a lot of generic tasks with one type of model and do multiple different things at once. But what they're also really good at is being able to be customized further on a downstream application. So you can take these types of generative AI models and tune them to be able to generate images of cats, or you can create one that's just designed to create Hollywood movies, another one that's designed to talk to only salespeople at a company or lawyers. And so that's what's really interesting about them, is that they're not only just general intelligence models that could do lots of different things really well, they're generalists in a way, but then you can then take them and customize them to make them more expert, domain specific models. [00:09:34] Speaker A: And the t in GPT is transformer. And you're calling transformer really that next turn of the crank of technology. [00:09:42] Speaker B: Exactly. And chat GBT is very big now. Everyone thinks these types of models are brand new. Those of us working in this space have been developing these technologies for a while. So transformers were announced in 2017. But what's different is that there's unique things about their architecture, and we can talk about self attention mechanisms and other things like positional encodings, but the idea is that they're really good at being able to do a better job of understanding, especially language, human language, but also even image data and other types of sequence data, and then be able to be trained on large corpuses or mountains of data. And the more you throw data at these types of models, because they run really well on GPU for parallel processing, the better they are, and the more powerful they are at being able to do really cool and innovative things that we haven't seen before, like generate a video of a cat moonwalking and doing crazy things like that. [00:10:35] Speaker A: So let me talk about the breadth of that LLM, or the narrowness of it, and it really depends on, I'll give you the example that I'm thinking of. There was a fascinating article. It was a piece, we're going to write something, and I think it had an option of Star Trek Shakespeare. I forgot what the third one was, so I was following the Star Trek one nerd that I am. So it gave it like a prompt for a Star Trek script, and then it showed you what came out of the model after one pass, and it was nothing but random letters. It would be kind to call it gibberish. And then it ran 100 passes, and it was okay, it was gibberish. But after hundreds and hundreds and hundreds, it was showing you after a thousand, after whatever the numbers were, it's 10,000. All of a sudden, you found words, but they were just words. But then you found words that related to each other, and then you found words that were very Star Trekky. So I guess my question to you is, did it just feed it Star Trek scripts, or did it feed it every text there is in the universe and say, but we should weight this towards Star Trek? Because I know these neural networks have obviously weights that programmers put in. How did it get to Star Trek? [00:11:44] Speaker B: Yeah, that's a great point. So these models, the first thing that we do with them, we call them first of all, foundational models, because you take a transformer model, and what you do, and this is what happened with Chad GPT and many of the other popular models you've heard of, is you train them on a lot of data that you find publicly available on the Internet. So Wikipedia articles, Reddit posts, and so on. And this is done in an unsupervised learning way. So what that means is we're going to just give you a bunch of data. We're not going to teach you anything. We're going to guide you. We're not going to tell you, hey, neural network, this is how you understand how to parse through a language and learn the relationship between words. You literally just give it a bunch of data and say, figure out for yourself how human beings talk to each other and what is the way in which we communicate. [00:12:25] Speaker A: Could you talk about unsupervised? Just to clarify, if I'm teaching a car how to drive, I have to give it rules, right? Stop at a red light. Is that supervised? [00:12:36] Speaker B: That is supervised. The challenge with a lot of AI models is that you got to give it an input, and then in the beginning stages, when it doesn't know what it's doing, you show it what the output should be so similar to when you have a toddler and you're telling it, you're trying to teach it what's right and what's wrong. Every time it makes a mistake, you say, hey, no, that wasn't correct. This is what the right answer should have been. And then the toddler learns, okay, this is what I'm supposed to be doing. The AI model does the same thing, but what we do in AI models is it does a bunch of math to update its weights, its parameters, and then it figures out what it's supposed to say and what's not supposed to say. That's supervised learning. And then when we talk about these language models, it does this on its own and it's unsupervised. So I don't give it a bunch of Wikipedia articles. And, you know, for example, this article is about Shakespeare. This is the topic of the essay. It just figures out that this essay is talking about Shakespeare, who? Shakespeare is. What does it mean to write a poem, how human beings compose poems? It figures out all by itself. And that's what we call unsupervised. Learning. And so from there, that's the first step. We just give it all the data we can find on the Internet, so it can learn the basic rules of grammar and human language and how humans being talked to each other. From there, you have a model that could do a decent job of understanding what words mean in relative to each other. But then, like you said, you want to be able to narrow down on some very specific applications, like, hey, I want to be able to understand Star Trek jargon. And so the problem there is that a lot of the data that I might have trained this model on, on the Internet, maybe didn't have a lot of Star Trek data. Maybe this is the first time it's seeing it. And sometimes these models, what they're really known for, is being able to still do a good job of generalizing to new data they haven't seen. But the more specific you get, the harder it is to get good results. And so in this case, with Star Trek, what I can do is take this model that already has a lot of good foundational knowledge about how humans being talk to each other, and then I can give it just a couple more data points to show it. Okay, here's how Star Trek characters talk to each other, and then from there, it's able to very, very quickly understand what Star Trek is and the terminologies used there. Almost think of it as you teach a high school student how to learn the basics of algebra, and then you teach them, then from there, how to solve more complicated problems. They're going to be really good at quickly learning because they already built up the foundational knowledge that they need to know about math and the rules of mathematics and whatnot. So that's what AI models are really good at. They're able to learn from a lot of generalized data, and then they can dive deep into a very specific thing with only a couple more examples. [00:15:07] Speaker A: Let's say I'm an entrepreneur and I've got first party data. First party data must really matter because there's just gobs of third party data out there, but anybody can get access to that. But if I'm an entrepreneur and I'm starting a company, and I've been managing to really do a good job of collecting this first party data, and that's very relevant to my customer set, my assumption is I can take this large base, this large LLM, and then I can uniquely augment it with my own first party data and end up with something very special as a product. Is that right? [00:15:38] Speaker B: Exactly. And we call that process, customizing or tuning the model so that you take something that's a generalist and you teach it how to be a domain specific expert. And the more data you have, and especially if that data is your differentiating moat and something that you have access to which others might not, the more you'll be in a position to then end up with a model that can do something that other people can't recreate. Because at the end of the day, the data that you have access to is the thing that makes or breaks a lot of these models and how good they are performing certain tasks. [00:16:09] Speaker A: My market research hat's just like burning up off my head because I'm thinking about so many things. I could create a medical chat bot and I could scrape WebMD in a million sites, but so can anybody else. [00:16:22] Speaker B: Exactly. [00:16:22] Speaker A: What can I do to be a successful medical chatbot company? And it does sound like everybody needs to be searching for their own secret sauce. [00:16:30] Speaker B: Searching for your own secret sauce and coming up with clever ways to continuously collect more data, and especially if you can find ways to kind of involve your customers in that process. So a good analogy that I always like to give is with Tesla. One of the reasons why they're leaders in the autonomous vehicle space, and their cars are able to do a good job of recognizing what's in front of them is that there's a fleet of teslas out there that are constantly collecting data from the roads. And when they see something or that they don't recognize or there's an issue, the AI model breaks down. That data is actually sent back to Tesla, and there's a human being in the loop that teaches the model. In this case, you missed the fact that this was a stop sign. I'm going to make a note of this, and I'm going to then retrain that model so it learns. So the next time it's in that same position again, it's able to make sure it gets the right answer. And if you have enough of these cars out there and enough humans in the loop to be able to make these corrections, you end up with a very good feedback cycle, feedback loop where models are constantly iterating and getting better over time and learning in a self kind of cyclical process. So if you're able to get data that's very unique and differentiated, that's great. If you can find a way to constantly be able to get feedback on how the models are doing, that's even better. Fantastic. [00:17:48] Speaker A: And I guess that's a data mode, which you referenced to earlier. So that, for sure, that's data mode. [00:17:52] Speaker B: Exactly. [00:17:52] Speaker A: So how has your work at Nvidia evolved? I mean, you talked about 2017 when all this, some of this stuff started. How do you see what you were doing and you look back on your career, and how do you feel about what you see as you look ahead? [00:18:03] Speaker B: It's definitely been an amazing journey. My break into AI was to the startups that I worked on earlier in my career. And even just from when I created my first company to now, the advancements in all the fields of computer vision to natural language processing and management systems have been truly astounding. I think Nvidia is a very unique position because we started off and we still do work on building a lot of the hardware that actually allows us to be able to train these models and deploy them right. AI models, what they're known for is parallel processing, and gpus are designed to do parallel processing. That's really the big advantage of them. And so without the hardware, none of this innovation would have been possible on the software side. But what's been really interesting and really cool is that I work on the software team in Nvidia, the deep learning software team, where we build products that allow customers to be able to take foundational, large language models like GPT, and then be able to train them from scratch or customize them on their own data sets, usually proprietary data sets, and then end up with a model that's unique to them and also help them deploy those models. And so we're really focused on targeting enterprise customers that can't really take something like chat GBT, which is very generic, and deploy it. They need to take it and customize it on their proprietary knowledge bases. So they end up with a model that works really well for them. [00:19:20] Speaker A: So it's interesting that we're going to take this baseline and we're going to put the right thing on top of it. In the case of medical, of course, there's software engineering, there's marketing collateral. So what do you see, Mahan, as you look across the breadth of industries, are there one industries that you think are going to get there sooner than later, or everyone needs to race like hell to win? What's your view of how this is going to go? Kind of vertically oriented? [00:19:44] Speaker B: I think what's really cool about these types of models is that the foundational models are horizontal, so you can take them and deploy them for radio applications. As you said, you can take them and adapt them to very specific verticals. And I think that the short answer is every single industry can be impacted by this as long as there's enough data to customize these models on. But I'll give you some very concrete examples from even the work that we do. Nvidia and with enterprise customers, we've seen lots of interesting use cases. And I like to kind of focus in on healthcare because I have a background in that space. And so I've seen things where these generated AI models are used for things like being able to generate models, 3d models of the human body and organs, for surgical planning, to teach medical residents how to be able to perform surgeries in a way where it's obviously non invasive, to be able to build, essentially a digital doctor, where a patient can come into a room, talk to an AI system, it can understand everything you're saying, help diagnose what kind of conditions you're going through, it could look at you and even pick up on visual symptoms. So models are starting to become more multimodal, and that's what we're also seeing with GPT four, where they're not only able to take in text as input, but also images as well. And then another interesting application is drug discovery. And this is something that lots of people don't really think about when you talk about large language models and generative models. But if you think about it, human language is very complicated to understand, but the language of protein sequences is also very difficult. And it just turns out that transformer models, these types of AI models, are also really good at understanding that. And so now Nvidia is working with companies like AstraZeneca and others to design unique type of large language models, catered towards the life sciences and drug discovery to develop life saving vaccines and groundbreaking medications that can really help cure diseases that in the past would have probably taken decades to develop and do. Our neon. [00:21:34] Speaker A: Those are great stories. And considering we started talking about text and images, you've just brought, brought them together, this is such a great conversation. I can't thank enough for giving us the time today to be with us. Thank you so much. [00:21:46] Speaker B: It was a pleasure. Thank you for having me.

Show Notes

Episode Transcript

Other Episodes

Episode 43

Episode 43: A Pragmatic View of Machine Learning and Artificial Intelligence

Episode 5

Episode 118: The Business Case for Deep Fakes with Descript's Kundan Kumar

Episode 12

How Self-Sovereign Identity Works with Trinsic's Riley Hughes