Lowering Barriers to AI Adoption with AutoML and Interpretability
Overview
Building good machine learning models is difficult and time consuming, and few engineers and scientists have the necessary experience. Automated Machine Learning (AutoML) simplifies that process to a few steps, identifying the best model and optimizing its hyperparameters in a single step, thus making machine learning accessible to any engineer. We will also demonstrate various interpretability methods available in MATLAB that overcome the black box nature of machine learning, lowering the bar to adoption of machine learning in industries that cannot tolerate black box models, including Finance and Medical applications. Finally, we explain how incremental learning makes models improve over time and adopt to changing conditions.
Highlights
- Learn the three steps to obtain an optimized predictive model from raw signal or image data
- Demonstrate various interpretability methods to explain model predictions
- Apply incremental learning to adapt models to changes in the environment
About the Presenter
Bernhard Suhm is the product manager for Machine Learning at bat365. He works closely with customer facing and development teams to address customer needs and market trends in our machine learning related products, primarily the Statistics and Machine Learning toolbox. Prior to joining bat365 Bernhard applied analytics to optimizing the delivery of customer service in call centers, after specializing in speech user interfaces in his PhD from Carnegie Mellon and Karlsruhe University (Germany).
Recorded: 15 Dec 2020
Hello. My name is Bernhard Suhm. And I'm the product Manager for Statistics and Machine Learning at The bat365. Welcome to this webinar on how you can lower barriers to AI adoption with AutoML and Interpretability.
Where is AI hitting barriers to adoption? Here's a couple examples. AI modeled aren't as explainable as traditional methods. The data science team in the financial services company developed another neural net-based liquidity model, but couldn't explain how it worked to senior stakeholders who had then withheld their approval.
Unexpected bias sneaking into the model. And the [? iModel ?] that a credit card issuer used to determine credit limits showed unintended bias against female card users. As a third example, the data-driven approach has long been expected to help improve delivery of health care. This and similar articles identify multiple barriers to adoption in this segment. Fear of fatal errors that cannot be explained, for example, or that automation is seen as a threat to existing jobs.
Today will focus on addressing these two barriers-- obtaining optimized models without extensive iterative tuning efforts and expertise by using AutoML. Second, overcoming objections to the blackbox nature of AI models.
I'll illustrate solutions to these challenges using two examples-- human activity recognition; and from the medical field, ECG classification. While the bulk of this presentation focuses on machine learning as AI, I'll provide an overview of similar solutions that exist for deep learning. I'll conclude by addressing some of the other barriers to adoption of AI.
Stepping back, it's actually quite amazing how much AI has already been adopted across many industries. Here are some examples from our users. BMW used various sensors to predict where the vehicle is entering an oversteering [INAUDIBLE]. In manufacturing, the semiconductor company ASML improved measurement accuracy of overlay on silicon wafers. Atlas Copco improved the monitoring of thousands of deployed compressors by updating a digital twin using sensor data, and many other examples.
Before we get deeper into the topic, let's clarify what we mean by AI because there's multiple notions out there in the community. At the broadest level, AI can mean just any program that enables a computer or robot to perform tasks commonly associated with intelligence. Then, in the '80s, machine learning came along, and it applied statistical methods to learn tasks from data without explicitly programming.
And then deep learning emerged as a type of machine learning that uses neural networks with many, many layers. What are the limits to adoption today? A Gartner study investigated that, interviewing more than 100 executives. And this is what they got back.
Near the top, skills of staff that don't know how to apply AI, and available data, scope, and quality, as well as fear of the unknown, in particular not understanding the benefits of AI and how it's used. This and other studies confirm the top barriers to successful adoption of AI are lack of staff with the sufficient skills, the blackbox nature of AI models, and the availability of labeled data.
Today's talk will focus on the top two barriers-- lack of AI skills and blackbox nature. Effective solutions to these barriers require an understanding of where lack of AI skill and more transparency impact the workflow for building and integrating AI models into systems.
This graphic shows the workflow for machine learning. But deep learning is impacted in similar ways, and we'll touch on some specifics later. So these are the tasks that are notoriously time consuming and require significant expertise.
After preprocessing, data which typically takes the majority of the time of data scientists, comes the feature engineering. To get good features frequently requires domain knowledge, especially for communication, radar signals, and text processing.
Next, you need to decide which of many models suits your problem best. There's the saying "no free lunch" in machine learning. Even experts cannot tell which type of model they'll perform best, given a specific problem. Once you have chosen the model, you need to tune its hyperparameter just right to get good performance.
As you progress towards deployment, you may realize that the size of your model is too big. So you may have to go back and select a subset of performance features. Finally, integrating your model in a larger system may require you to explain model behavior to stakeholders who aren't familiar with how AI works.
If you apply deep learning instead of machine learning, you're essentially facing similar challenges. You still have to tune and hyperparameters of your deep network. Instead of optimizing models, you have to choose between different network architectures. And even though deep learning has become known to not require features, at least for signal and text applications, some form of data reduction is critical for good performance.
We'll demonstrate our solutions in the context of two applications and begin here with digital health-- the task of classifying the heart condition based on ECG date. The ECG signal is typically characterized by the so-called QRS wave. This is this big spike in the heart signal. And this signal causes the heart muscle to contract strongly and push out the oxygen-rich blood into all the arteries. And it needs a lot of power for that.
Other ways here are associated with transferring oxygen-poor blood to the lungs. But what experts look at is the distance between these large R spikes, so-called R-R Interval. That's all you need to remember to be able to follow what types of features we'll engineer.
So our first solution to lowering the skill needed to build AI model is building models interactively, like shown here in the so-called Classification Learner app that allows you to compare the performance of many different popular models at the click of a button, evaluate the accuracy using different metrics, and then even interactively tune their hyperparameters.
But instead of talking about it, let's look at this in MATLAB. Let's talk in more detail what kind of features we could use for this task. Of course, you don't have the medical knowledge, but I mentioned the distance between these R peaks is important, the R-R Interval.
And we'll look at three consecutive such intervals. And I'll not only look at how fast they occur, but the ratios, the ratio of the first to the second and the third to the second. I'm not going to walk you through how to compute these features. Instead, we'll load them pre-computed into the workspace. But you can here at RR0, RR1, RR2, and then the ratios.
Next, we'll interactively build a couple of models in the classification learner. You can find it on the Apps tab. After loading the data, I already built a couple of models-- a find decision tree and logistic regression, which is a good base model for two class problems. But you see the accuracy is much lower.
The support vector machine is not much higher either. Let's try another model like a medium tree. That's performing significantly better. Finally, let's try a sample of trees or random forest.
This seems to perform best. So that's the one we want to analyze more later. So to prepare for that, I'll export it into the workspace as a variable. This should have given you a sense of how easy it can be to build a couple of models and compare their performance.
But what does AutoML add? Here's again our workflow with all these complications, of which you just tasted a few. Well, the idea of AutoML, at least in theory, is to take away all this complexity and ideally go straight from the pre-processed data in your classification problem to a model that's optimized and ready to be deployed in your system. No AutoML solution on the market today comes close to this aspirational goal.
However, at bat365, we've developed a version of AutoML for engineering applications. Here's again our workflow. As first step, solving the feature engineering problem, we lean on our experience with wavelets. They are really good at matching spikes and irregularities in real-world signals. You may not know much about wavelets, but that doesn't keep you from applying these techniques.
Second, since typically there's hundreds of wavelet features generated-- and that's too many for small models-- you need to select a subset of performance features. And many automated feature selection methods are supported by MATLAB. Finally, in key, the model selection with hyperparameter optimization built-in. We have a single step function that accomplishes those goals and delivers an optimized model.
Let's look at each of these three steps in more detail. First, feature generation with wavelet scattering. You may wonder what are wavelets. They help us to decompose signals into smaller parts. If you know Fourier analysis, that composes signals into their sinus components. Similarly, wavelets decompose signals into their wavelet components.
However, wavelets are very constrained in time, and they can vary in width. So they are well-suited to match small irregularities in the signal, like alluded to in this animation.
So then, wavelets scatting decomposes a complex signal into its various wavelet components. The advantage is you don't have to figure out what kinds of wavelets. That's done automatically for you.
And then the features are computed for you as well. Some compare this to what initially as in deep networks are doing. They decompose the image into its various components. The advantage of wavelet scattering is that you don't need millions of examples of data, unlike deep networks. And this works for both signal and image data.
The next step, in AutoML voice feature selection, many methods are available. Here, I will just highlight a few. NCA works pretty well for classification regression. Though, more recently, we have added MRMR, Minimum Redundancy Maximum Relevance of Features, which computes really fast, is model-independent, and delivers a strong set of features. If you have a high-dimensional feature space and are looking for fast computation, these two univariate methods we have added recently may help you.
Third, let's understand the simultaneous model optimization with hyperparameter tuning better. In the animation that follows, you can see the optimization evaluating different model types. And behind the scenes, also many different type of parameter components, so that the error rate shown here in blue decreases over time and converges to a minimum.
To make this process more efficient, we employ the same Bayesian optimization technique that we already applied to hyperparameter tuning in past releases. This allows us to efficiently transfer this large space of model and hyperparameter components and limit computational time. However, truth told, it still is computationally expensive. So you will need to bring parallelization to bear on larger data sets. Parallel computing is supported for the AutoML functions.
We'll demonstrate AutoML on the human activity recognition task, where sensor data from the accelerometer of your mobile phone is captured and then classified which type of activity you are currently performing-- walking, standing, lying. First, we'll load this raw data and visualize it in what we call a stacked plot so you can see the accelerometer from X, Y, and Z.
And here I picked section where the activity changes from walking to sitting. And you can see the stark difference. So you might think about what kind of features would capture this sort of change. But instead, we're applying wavelets scattering as first step in AutoML, where you first define the wavelet framework using this function that just has the signal length and sampling frequency as input.
And then, wavelet scattering is applied to both the unbufferred train data and the raw test data. Let me see over here, it computes almost 500 wavelet features. So those are too many for a small model.
And step 2, in AutoML, we apply automated feature selection, and here the FS MRMR function, which stands for Minimum Redundancy Maximum Relevance of Computed Features. And it's going to rank those over 500 features and will just display the first 50 of those. Here comes the ranking. You see how the score drops off fairly quickly, but then there is a long tail.
So to arrive at a more compact model, we select just the first 25 features. And then as third and central step in AutoML, we apply the model selection and tuning function fitcauto. Let's get that going.
And you can determine the various parameters. We limit the number of iterations to 50. So it's going to finish reasonably soon. And here it starts to evaluate the first couple model and hyperparameter combinations. You can see a k-nearest neighbor model, SVM tree, discriminant analysis. And here you can see how the error plot starts to converge towards lower values.
You may wonder how well does this actually work? We compared AutoML, as you just watched me step through in MATLAB, with the manual process of first figuring out what features to use and then trying out many different models and optimizing the hyperparameters manually.
First, we looked at the human recognition test that we just demonstrated. And we compare that also to heart sound classification, where you take phonograms of heart sound recordings and then classify them into normal, abnormal. And here are the result.
AutoML matches the performance of models where someone knowledgeable in machine learning tried to apply the tricks of the trade and optimize the model performance. The point of AutoML is not that it's going to beat the manual optimization process, but it's a big win to get a model that achieves comparable accuracy without the complex and time-consuming model building process.
Let's move on to the second barrier to AI adoption, the blackbox nature of models. Ideally, we had interpretable and highly accurate models available. But this chart shows that there's a trade-off between interpretability and predictive power.
There is easily-interpretable models like decision trees, logistic regression, linear models. But they are not as performance as more complex models like both the trees, SVMs, and deep learning networks. So interpretability is needed to overcome this blackbox nature.
But more specifically, at least in some industries, you have regulatory requirements like in finance. Or in Europe, there is GDPR. And for medical devices, there's regulations like by the FDA in the United States.
Finally, data scientists, to improve the models, they like to understand in more detail how they are working. So to help give back them interpretability is helpful. I've used interpretability-- it more specifically means the causality of model decisions in mostly traditional machine learning. Whereas, I see explainable AI used most in the context of visualization activations of deep neural networks.
Let's understand better where regulatory requirements ask for interpretability. I already mentioned the finance industry. And here, credit and market risk models really require to be explainable. One reason is that the traditional models used for these use cases were very explainable. So that's what stakeholders, including senior management and regulators are expecting.
There is typical complex models, very popular in finance, gradient with the trees, and also some deep neural networks. And as methods, for interpretability, Shapely is very popular. And you'll understand in a few minutes why.
By contrast, in the auto and air industry, you need to meet safety certification requirements. Deep neural networks are employed for image recognition and reinforcement learning to map out paths. The actual regulation hasn't been finalized, but bat365 has deep experience in such safety-critical applications for vehicles with ISO 262626 or flight regulation, DO-178 as an example. And these two bodies mentioned here are currently working on issuing similar guidance for artificial intelligence. And bat365 is involved in some of these conversations.
As a third industry, a medical regulatory approval is needed at least for some classes of medical devices. Deep neural networks are also used for image analysis, but also classic machine learning. The landscape isn't quite as evolved.
If you're working in a different industry with specific interpretability requirements, we'd love to hear from you. I've mentioned a couple of popular interpretability methods here. Let me help you understand what interpretability methods are available and when they are used.
So at the beginning of this process, you may ask yourself the question, do the explainable methods deliver sufficient accuracy for my problem? Then, you can just use the inherent explanations available in those, like, the weights for linear models and GAMs or the branching and decision trees and posteriors for Bayesian models.
If those simple models aren't accurate enough, you need to look at more complex models. But then the next question is, do I need to explain just the local behavior? If that's the case, there is LIME available and Shapely.
The distinction is whether you need a complete explanation. And only Shapely delivers that, a complete explanation of the contribution of all the factors. And that's what's asked for in finance regulation. That's why Shapley is popular in that industry. If, however, you are looking for a global interpretation, feature importance and partial dependence plots are the methods to go with.
LIME stands for Local Interpretable Model-Agnostic Explanation. That's quite a mouthful. But at heart, it's a fairly simple method. We approximate a complex model, as shown here with the blue dots and green dots in this complex decision boundary.
We approximated not everywhere, but in the vicinity of a point of interest shown here in yellow. And to do so, you pick a few labeled spots of both classes in that vicinity and then build a simple model using those, like a linear model in this case. And then you can use the inherent expendability of that simple model to provide an approximation for the complex one. So in this example, the weights of these different factors can explain the complex behavior in this vicinity.
Instead of talking more about it, let's look at another demo. We'll go back to our first example from ECG classification. Because to apply model interpretability, you need expectations how your model should behave, drawing on knowledge or data from your application domain. By contrast, it's difficult to derive such expectations for the wavelet features that we generated automatically in the human activity recognition example.
All right, remember, before we trained a couple of models and among them, a medium decision tree. One way to validate your model is leverage its inherent expendability. For trees, that's the branches. And here we've just displayed the tree branching for this model.
You can see that the decision looked at these RR values. And if they are small enough, you go left, left again, left again. And then you end up with an abnormal heart. That makes sense.
If those intervals are really small, the heart is beating really fast. So that's probably a bad sign. But analyzing this data in detail would be cumbersome. So let's look at other interpretability methods-- global ones.
We can look at feature importance for the complex bagged tree or random forest. And that tells us which are the important features. Here we'll plot that. And you can see those RR ratios, again, are the top three that makes sense.
And then we have these amplitudes. We'll look at that in a little bit and then followed by the ratios. So one method to look at that globally is the partial dependence plots. Let's do that here for one of these RR ratios.
And as you can see here in this graph, the likelihood of abnormal heart decreases sharply after 0.05. So what that means is that if those spikes are really close together, the heart is beating really fast. It's likely abnormal. Otherwise, it's OK.
And then we see something similar for other RR values. Now, let's look at the ratios. Here, we plot the same partial dependence plot for either one of these ratios. So if we look at ratios, the ratio near one like here, means subsequent R spikes have the same distance. That means normal.
But if they don't have the same distance, like, for these high values, that's likely an irregular heartbeat. So that's a bad sign. So that's why the likelihood goes up.
After looking at some global interpretability methods, let's look at some local ones. So there you look at a particular point of interest. One case to apply that is to understand when the model went wrong, what happened.
So let's look here at some predictions and pick out the ones where the model made an error. We'll do that here by finding the wrong ones and then starting to prepare for applying LIME. And then here we actually fit the LIME object onto the second wrong prediction.
So what do we see here? Here, we see the LIME model for that point. We see, like, before these RR ratios play an important role in the prediction. That makes sense, nothing surprising. So why did it go wrong?
Well, we see that the values for RR1 and RR2, which still have pretty high weight, were way higher than 0.05. So they were way in the range for normal hearts. So that's why at this place the model predicts a normal heart condition, even though it was actually abnormal. So here was an example of how you can use local model interpretability to understand when models make errors in prediction.
Now that you have a pretty good understanding what interpretability is available for machine learning, let's look at deep learning. There, it mostly means explaining why the deep network made certain decisions.
Here's an example. This image of a mug got misclassified as buckle. So now you can look to interpretability. What parts of the image did the deep network look at-- was focused on the buckle and not on the mug? So that gives you a hint there is a bias still in your training data most likely.
And one way to address this is adding training examples of mugs where there is no hand and buckle in view. There's a bunch of methods available to do similar analysis, including occlusion sensitivity and GradCam and Image LIME.
So I spent a fair amount of time talking about two challenges to adoption of AI-- the lack of AI skills and the blackbox nature of models. Let me round out today's webinar by talking about a few other challenges.
If you remember, the model building workflow, it began with pre-processing your data. For a sense of numeric data, MATLAB provides interactive tools to tackle common problems with raw data, such as filling missing data, identifying outliers, smoothing data. So we have live tasks available to do those interactively.
Next in the workflow, for supervised learning, having sufficient amounts of labor data is a huge challenge. Specialized labeling tools can help, especially if they provide the option to automate some of the labeling by applying initial AI models to obtain rough labels so that the human annotator only has to review and occasionally correct such draft labels, such as in the signal image and video label apps that are available in the signal and image processing toolboxes.
Once you have trained a performance model, chances are you need to integrate it with a larger system. Simulation environment such as Simulink and model-based design are used in many industries to facilitate system integration and testing. Finally, once you've deployed your model, you have to monitor its performance and may need to update it at least occasionally. We are supporting incremental learning for some machine learning models and model updates who deploy models without regenerating code.
Let me expand on this last point a little bit more. Once you have a performant-initial model that you are ready to deploy, automated code generation can convert the high-level MATLAB into low-level CC++ code, which then can be executed on your hardware and embedded within a larger system. Once the system is deployed, you typically collect data, and you can use that to improve your model, either applying incremental learning or retraining the whole model back in MATLAB.
Now comes the key point. As you move the model, update into your production system. You want to avoid having to update the deployed code in having to go through certification procedures over and over again. Instead, you just pass the updated model parameters into the deployed system using communication mechanisms like over the air. A different use case for this workflow is performing software and hardware in the loop testing of complex systems with different model configurations.
So in conclusion, today, I have demonstrated how MATLAB can lower barriers to adopting AI, in particular, machine learning. I've spent a fair amount of time talking about making building models easy interactively or by leading on AutoML, so that engineers and domain experts without expertise can build models themselves, or that even experienced practitioners are more productive building models.
I've just touched upon code generation that facilitates embedded deployment and integration with Simulink that's been made easier with a new library of native machine learning blocks. Similarly, for deep learning, there is blocks available to integrate those in Simulink models.
If you want to learn more about what I've talked about, the Classification Learner, as an example of an interactive tool that makes it easier to build models, or a video on AutoML, where with three simple steps, you can get optimized models, and a walk-through how you apply different model interpretability methods. Going back to the basics, we also have a two-hour onramp that helps you get familiar with machine learning in MATLAB and links to the demos that I referred to in this presentation.
The bulk of my presentation referred to machine learning, but similar tools are available for deep learning as well. To learn more, refer to these resources-- an introductory video and the interactive deep network designer app, examples illustrating various visualization techniques to interpret the behavior of deep neural networks, automatically tuning hyperparameters using Experiment Manager for deep learning, and also a two-hour Deep Learning Onramp.
You can find these resources on bat365. For example, start with a solution of pages for machine and deep learning. The URL is displayed here. You can also request free trials for the statistics and Machine Learning toolbox and Deep Learning toolbox, depending on which type of AI model you are applying. This brings me to the end of this webinar. Thank you very much for your interest.