How Explainable AI enables trust with Fiddler.AI’s Krishna Gade

[00:00:04] Speaker A: Hello and welcome to the Impact podcast. [00:00:06] Speaker B: I'm your host, John Fryle. There's been a lot of conversation about the importance of responsible AI broadly, this idea that AI should be free from bias and can maybe impact people and society for the better. There are a lot of pillars that make up this idea of responsible AI, and the one we're excited to dive into today is explainable AI. Explainability provides insights into how you train your data, how it's performing. So if something goes wrong, you know exactly where to start looking for a cause and how to create a solution. This transparency means you're not only preventing performance issues, you're avoiding potential negative publicity and maybe even fines. This is something Fiddler AI, our guest for today, is enabling for companies based in Palo Alto. Fiddler's platform promises to provide a unified environment that has a common language, centralized controls with actionable insights to operationalize ML and AI with trust. They also call themselves a pioneer in enterprise model performance management. Building trust in AI through transparency is what they're all about. And we ask Fiddler's CEO Krishna Gade to explain what that means from a technical perspective. Spoiler mlops is a huge part of it. Here's Krishna to tell us more. [00:01:28] Speaker C: I'm Krishna. I'm the founder CEO of Fiddler AI. We're a startup in the Bay Area trying to build trust with AI. We've built a product category around model performance monitoring, or model performance management that helps AI and machine learning teams to understand how their models are being built so that they can build trustworthy AI for their organization. Previously, I used to work at Facebook, working on similar things for Newsfeed, which is the core product of Facebook, where we worked on tools that explained how Newsfeed algorithms worked for both technical and non technical folks. And that's how I got into this area and that's how I started Fiddler. [00:02:08] Speaker A: So I understand that when you get more data, you have the ability to make a model smarter, but at the same time, models can change. So talk to me about the balance between data and then models themselves, please. [00:02:23] Speaker C: Yeah, that's a great question. See, like, what is an AI model at the end of the day, right? Essentially, it's a pattern recognition system that is looking for patterns within the data and coding them into an artifact, which is called a model that can then be used to predict the future. So, for example, let's say I have a bunch of historical data around good loans and the bad loans that I'm approving. Let's say I'm a bank and I know all the bunch of different characteristics about my loan applicants, their salary, their debt, and what's their FICO score and whatnot. All of these variables will then go into this model training process, where the model learns patterns, which are then encoded in this sort of artifacts like neural networks or decision trees, which then can be used to predict the risk of a new loan applicant when they come and apply a loan with the bank. So at the end of the day, the machine learning model or an AI model is highly dependent on the training data that it was used, right? So the higher the quality of the training data, the higher the quality of the model is going to be. Now, therein lies the problem as well. Now, if your data starts changing, then your model's accuracy may not be how it was trained. So let's say I train my model today, and I deploy it to production, and it's running and predicting credit risk scores for my customers a few months later. Let's say things changed in the market, maybe fed increased interest rates or decreased them, or there's global effects that could be happening. The war in Ukraine, or let's say the pandemic when it happened two years ago. There's so many things that can change, both macro level as well as local level, that can affect the loan applicants that I'm getting. The people's salary might change over time. The kind of people that are applying for my loans could change. And when that data changes, the model now that I've trained a few months ago can actually perform in a suboptimal manner, can actually lose in terms of accuracy, how it's actually predicting credit scores. [00:04:34] Speaker A: I was going to use an example about something. Odball example about. Well, it turns out the people that live in New Jersey that drive Chevrolet Novas, Vintage Nova. But you mentioned salary. I think that's really important. There was a time when the economy goes down and a lot of Wall street folks are getting smaller and smaller bonuses. That would drive salaries down, which could have an effect. The flip side is, more and more people are making closer and closer to 15 or $20 more people getting paid more in Vermont here, people, they're offering $25 an hour to drive a bus during the snow season. That affects things. So that's a data point salary, which obviously is very relevant to a loan. Now, that means the data itself changes. Do you then retrain the model? [00:05:22] Speaker C: You need to know when to retrain that model. So there's this phenomenon of what we call data drift. So this shift in data is called data drift. And that can actually cause models to drift in performance. So models can lose accuracy. Now, the problem though is you need to detect when that happens so you can actually retrain it. Or maybe the model is drifting because of problems within your data pipeline. So some of these data shifts can be actual real data shift, like there is a real data shift happening in the environment, but it could also be system problems. Machine learning is a large scale data infrastructure, and these data pipelines could break and they might be sending incorrect data, noisy data to your models. And that can also cause the model performance to drift. So you have to figure out what has changed, whether this is a real change or a system error, and then see if you need to retrain the model. Because retraining the model is a costly exercise for most teams. You need to spend more compute hours, people hours to retrain the model and come back with a better model. So this is where tools like fiddler can help you to detect model drift, but also help you root cause. What is the underlying problem, so that you know what needs to be done. [00:06:42] Speaker A: So it's interesting you say there is a cost involved in retraining the model. Be interesting because I could easily argue the flip side. There's a cost involved if you don't retrain the model and you're giving out bad loans. There's a terminology I'd love you to explain a little bit. And is it what we've been talking about, model performance management? Exactly. [00:07:02] Speaker C: So now all we have described so far is how do you manage the model performance so that the models that you've deployed are running intact. Right. So your goal, as you said, is my models are now serving business critical use cases. As you said, if my credit risk model loses performance, then it's going to affect the quality of loans that I approve and disapprove, and it's going to affect my business metrics. Suddenly, if I've approved a bunch of bad loans, then my business is going to lose over a period of time. So it's very important to manage the performance of these models while they're running in production, continuously monitoring them, being able to analyze how they're performing, explain predictions. Why was this particular loan denied? Why was this particular loan accepted? So you can build a first of all culture of transparency within your organization so everyone gets visibility into what's going on with your models, but also be able to make them better over time. [00:07:55] Speaker A: As I build my ML dictionary, which I'm going to make millions by publishing, model monitoring versus model performance management is model monitoring one subset of MPM. [00:08:08] Speaker C: That's right. So MPM, the way we describe it, has four pillars. One is the most important aspect of is continuous model monitoring. The other one is being able to do explainability of model predictions, so that when someone asks you, how is the model doing, you have an answer, or you yourself want to know to improve the model. Right? So you want to make sure that, let's say, if model is throwing a whole bunch of false positives, it's a fraud detection model, and suddenly false positives have increased. Why is the model actually throwing a bunch of false positives? What has changed so you can explain the model further? The third is root cause analytics. When you see a performance degradation issue, let's say you get an alert that the model is drifting. You want to be able to root cause, analyze it. So as you can see if it's a data pipeline issue or a real data drift issue, when you need to retrain the model, what do you need to retrain the model? Let's say if things have changed from the way the model was trained, like the training data set is now very different from the production data set, what has changed so you know you can build a better model. The fourth pillar is obviously fairness. You want to make sure that the models that are performing in a fair and equitable manner to your customers. [00:09:22] Speaker A: Great. I'm going to come back to particular explainability and fairness, but just to finish up my dictionary that I'm building here, and I have a feeling if I ask 50 people, I might get 50 different definitions. But I really want yours, Krishna. ML Ops, please. [00:09:38] Speaker C: Yes, ML Ops is this new term that is being developed that sort of describes the processes and tools that you need to operationalize machine learning and AI applications in production. So, like, if you rewind, a few years ago, most of the data science and machine learning work was happening in research in a lot of companies. There are obviously the tech companies that were advanced and doing machine learning for a while. You look at the general enterprise, most of the machine learning data science work was most limited to research or at best, analytical use cases, which are ad hoc. Now, with the advent of the tools that are helping people to operationalize these models, with the advent of availability of large scale data sets and compute power, now the ML Ops has a paradigm started forming. So what it constitutes is being able to train models at large scale, being able to experiment with different types of models, being able to capture all of the data, the feature data, and store them and make it accessible, being able to serve the model at large scale against traffic. For example, a consumer Internet website may choose to use machine learning for providing recommendations for their customers, and it should be able to serve that high throughput. Millions of people using the website for recommendations, ecommerce recommendations, or news recommendations, and then being able to monitor this model. How is the model performing in production? So, doing all of these steps in an operational manner and being able to retrain models and relaunch them and keep them intact for your business. This entire area is called mlops, and you need a set of processes and tools to actually get this right. [00:11:27] Speaker A: Excellent. So you talked about data drift and data integrity and how we train models. Let's take that to the next step. And how would then that be used? And I think I'm moving from the world of objective to subjective to detecting model bias. And bias, I guess, could be errors in the model, but I'm kind of. Okay, I understand that, but maybe bias is more than that. So talk to me about that, please. [00:11:52] Speaker C: Yeah. So bias essentially is a model performance issue, but it is specific in the sense that you're looking for model performance variations across different segments of populations within your data. So let us say again, going back to the credit scoring example, let's say my model's accuracy is 85% overall, and I'm happy with it, but it may actually, in that sort of 85% accuracy, it may be highly accurate for a certain segment, maybe it's 95% accurate for a certain segment, and it's 60% accurate for another segment. [00:12:27] Speaker A: Right. [00:12:28] Speaker C: And so what? The high level sort of model metrics hide these type of performance issues that the model might be having on different subsegments and when they are related to ethnicities or gender or age or anything that at a human level can be considered potential bias. Right? So it surfaces up as like a model bias. You want to see how your model is performing and if it is performing equitably across different segments, or if there is any disparate impact that the model might be having across genders or ethnicities. So there are a whole bunch of metrics that you can look at to understand model performance across these different segments. [00:13:13] Speaker A: I want to go to an interesting space, which is not ethnicity necessarily, and I don't know if it's an urban legend or whatever, but supposedly they did a study. I have no idea if this is where I got it. If somebody approved three loans in a row, the ods are the fourth one would not be approved. Now, subjectively, someone has to decide the pattern of in a row or the pattern of judges are more likely to give somebody a better result after lunch. After having lunch. How do you build that corpus of knowledge into this detection of biases? Because I think we really have gone from objective to what is not subjective. But at the end of the day, the creation of these measures are somewhat subjective. Does that make sense? [00:14:03] Speaker C: Great question, actually, as you just said. Right, let's say I'm a human underwriter. I've approved three loans in a row, and maybe the fourth loan, I'm a little bit more strict, or maybe I would sort of deny it because of my underlying biases. Right. So one of the good things with actually using machine learning is you are not getting into that. Such human bias issues. Right. Because it's the machine that's actually making those predictions. So that's a good thing. But the downside of it is the data that we are feeding this machine is this historical data, where a lot of it was done by humans. When you're trying to train a new loan underwriting model, a lot of your previous historical data may have human biases already captured in that data. And because as a society, we are not yet a perfectly equitable society, what happens is the data that we collect from the society will also be imbalanced. Classic example is the face recognition systems, right? When the first face recognition systems were trained, most of the data was representing a certain ethnicity, and certain ethnicities were completely missing in it. And sometimes this can be done unknowingly, right? Right. Sometimes it can be done knowingly. But the thing is, when these data sets then go into the model, the model then becomes biased. So machine learning on its own is not like biased. It's basically dependent on the data. And so because the data itself can be biased, we need to look into that. But the good thing is, as we move more and more towards machine learning, there will be a less chance of human bias, this sort of, like, subjectivity that can happen in the future. [00:15:41] Speaker A: Great. So what I'd like to get to then, and it's interesting, as I was doing my research for this podcast, you had a number of really interesting use cases. Four of the five use cases on the site were about explainability. So this is perfect as a natural segue from bias, explainability, and whether it's churn detection or governance or underwriting. Talk to me what explainability means to you, and let's talk about what fiddler does to help people understand what's coming out of a model. [00:16:09] Speaker C: So, before machine learning, or AI, got widespread, the way we used to build software that would make decisions is highly rule based systems, right? So you would say that. Let's say I'm trying to build a credit scoring system 20 years ago, it would basically be using certain rules. Let's say if someone's salary is about, let's say, $10,000 a month, or their previous debt is like less than $50,000 a month, I can compose these type of. Type of rules or formula that can actually give out the score, the credit score of a person that is applying for my loan. And it's highly interpretable, right? I can look at the rule and understand how the rule was made. As a human, I can look at all the corner cases of it. I can test the rule with a whole bunch of corner cases, when it would approve a loan, when it would deny a role. Now, with machine learning, let's say this rule based system is being replaced by a machine learning model now, and it's taking these attributes and spitting out a credit score. As a human, I have lost that interpretability. Right? How is the model arriving at this credit score? That is the question that explainability systems are trying to solve. They are now bringing back that human understanding capabilities so that you can see this is how the model is coming up with the prediction. And so, in the case of Fiddler, the way we approach it is we sort of provide explainability in the context of inputs. So we say that, say someone's credit score is like 700 or higher, right? The inputs that are actually driving the credit score up are these inputs. The inputs that are driving the credit score down are these inputs. This is the impact each of these inputs are having to that model. So this person having $100,000 of salary per annum is having a positive attribution towards their credit score, or their previous debt being $100,000 or more, is actually having a negative attribution to their trade score. So that's how we help you to understand how these models are working, and then we help you do fiddling the way you can do counterfactual questions. You can ask, what if the person had $150,000 of salary and only $50,000 of previous debt? Would their credit score be even higher or lower? And that's how you can reason about the model. [00:18:37] Speaker A: So I like how you're looking at, and I think the acronym is weightings and the different data inputs. Would you have been able to go back some number of years to the original Google face recognition system and say, hey, your data seems to only have this type of faces in it, or that not work, you'd only be able to look at eyes and nose and chins or whatever way they were looking at for facial recognition. [00:18:59] Speaker C: Yeah. I mean, today there are algorithms available that can connect the model problems to training data set problems, right? So they can tell you that this is basically explanations by example, where they can essentially explain you that, hey, this particular face is not being recognized or being misclassified, and they can connect back to the training data set and say that you do not have enough representation of this type of data point in your training data set. [00:19:29] Speaker A: I'm thinking a little bit about always a discussion of build versus buy, and companies are going to have to decide if they're going to build or buy models. You are, in my mind, kind of an unbiased third party overseeing things that it sounds like this is not something that anybody should always keep in house, but they should always be looking to someone, such as a fiddler, to help them understand that better. My next subject is going to be about trust, but it sounds like by being that outsider, you can help them do a better job of understanding and trusting what they have. Is that fair? [00:20:03] Speaker C: So the moment you are putting machine learning models into your business use cases, whether that's credit scoring or, say, recommendations or fraud detection or whatever business use case, you have an obligation to know how those things are working, both for your business, for your organization, for your customers. And without monitoring those models, you're running the risk of them going wrong and actually hurting your business, hurting your reputation, potentially putting you into regulatory compliance issue, if that's in your industry. What fiddler helps you is to help you foolproof and gives you the peace of mind by continuously monitoring those models and giving you these alerts and insights so you can actually fix issues before they become really bad. Like in this case, we are agnostic to the types of models our customers train and develop because we are a neutral third party. We are essentially a watchdog, watching all of your models and looking at how they're performing and helping you detect issues with those models. [00:21:12] Speaker A: That's terrific. So I do want to close on trust, and you've got some really interesting discussions on your website, and we'll talk about model monitoring, particularly fairness and explainable AI. So talk to me a little more about how at the end of the day, the trust will happen between a company and their potential users, or trust within the company and the model. So talk to me about your view of trust, please. [00:21:39] Speaker C: Yeah, absolutely. See, like three years ago, when Apple launched their credit card, it was basically approving loans or credit card applications automatically. Right? And when users applied online, certain users were getting very low credit limits. Especially women were getting ten x lower credit limits than men at the time. [00:22:00] Speaker A: I remember that one well, the spouses. And they had the exact same credit record. I remember that very well. [00:22:04] Speaker C: And there was a very large bank that was supporting the Apple credit card. And when customers complained to that bank, the response that they got from the customer support teams was, oh, we don't know, it's just the algorithm. And one of the users was very angry with this response. Started like a big tweet thread. It became a big news story, eventually got into a regulatory probe. So what happened at the end of the day, right, essentially, Apple or the underlying bank used certain algorithms, machine learning or sort of more complex algorithms, to predict these credit limits. And more importantly, they did not have the transparency within their organization so that they can answer those questions. When customers actually complained about it, they did not probably have monitoring around how the models were performing across different segments so they could catch those issues even upfront before they became too bad with those customers. This is essentially the trust, right? So the trust with your customers, trust within your employees. The customer support organization does not know how those algorithms work, and they're not in a position to help their own users. So this is basically the trust that we are talking about, right? How do we build that trust? By creating transparency. As humans, we rely on each other as a species. We are able to scale so successfully because we are able to scale trust, unlike, say, chimpanzees. If you assemble 1000 chimpanzees, there might be a quarrel, right? Or a big fight. But hundreds of thousands of humans can go to a big stadium and watch a football game. As humans, we are able to abstract things out because we are able to trust. We're able to scale trust. A billion people can elect a president or a prime minister in a country. So this is very important phenomenon. So for humans, transparency is a very, very important factor to build trust. Same goes with the machine, right? So now if the machine is making decisions, I, as a human, how do I trust it? So the way to do that is to build transparency. If you're using machine learning and AI, give that transparency to the people who are building that, people who are consuming that in your business and then your end users, and that's how you build that trust. [00:24:11] Speaker A: Context matters. I think this is great. We talked about that example. I think what made the apple example so horrendous in the eyes of many people was because apple pushed really hard around privacy and trust. And not to editorialize much, but if it was Facebook's credit card, nobody, it wouldn't be a big deal. And I hadn't thought about, but I like the concept of going to a stadium for a reason. There's a reason. There's contextualizing why certain things happen. So that puts trust in a very tangible way for people to think about it. So, Krishna, I think this was just a fantastic discussion. Thank you so much for spending the time with us. It was just a pleasure. Thanks so much. [00:24:52] Speaker C: Absolutely great questions. I enjoyed the interview. Have a nice day. It's.

Show Notes

Episode Transcript

Other Episodes

Episode 5

The Story of AI Transformation at BMO

Episode 23

OpenWeb’s Tiffany Xingyu Wang on making publishers sustainable with first-party data

Episode 70

Episode 70: Using Psychology to Help Machines Better Understand Our Words