Interpretable Machine Learning - a core skill for the competent data scientist

Machine learning interpretability is an increasingly important, and oft-neglected area. There are multiple complementary interpretability methods available to data scientists, and a working knowledge of these is critical if more powerful black-box methods are to be deployed in business settings.


Last year, at the International Conference on Machine Learning (ICML ‘17), there was a tutorial session on interpretable machine learning. At the time, I was working with a client that was having trouble making sense of the logistic regression model I had built for them. Hoping to wow my client with the latest and greatest from the cutting-edge, I attended, but was disappointed to find that I had already implemented much of what they had suggested - less-complex models, sensitivity testing, variable importance measures, etc.

Fortunately, there have been a few developments since then. But first:

Why should I care?

“We can't just throw something out there and assume it works just because it has math in it.”

Cathy O’Neil, author, Weapons of Math Destruction

Ideally, a model should be simple enough to be understood by those who use its outputs. However, machine learning is hard. Increasingly complex models move ever further from intuitive explanations, yet decision-makers have to rely on them to compete. Interpretable models are much easier for humans to trust.

The power and potential of machine learning is well documented. If that power is coupled with understanding, it opens the door to new innovations that have disruptive potential.

Let us examine the attributes of such a model.

The problem of abstraction and interpretability

To explain abstraction let’s look at a physical model, in this instance a car. What happens when you drive a car? You apply the brakes and use the steering to move around. Do you

question how the engine works? Have you even looked under the bonnet? Here you are working at the highest level of abstraction. The engine is a black box for you. Lower down the hierarchy, there is a mechanic who knows how the engine works. The mechanic knows how to repair the engine but may be unaware of how it was designed or manufactured. This is again a lower level of abstraction. For the mechanic, the design of the engine is a black box.

Now, as a driver, the engine may be a black box but you still trust the car. Why? Because the car responds to your commands in a consistent and reliable manner.

If we extent this analogy to machine learning, we indeed find that the modeller often does not have much intuition behind the internal model mechanics, and more crucially, the user does not understand the modelling process. While this is ok (as with our driver), there is a need to develop that trust before the model can be deployed with confidence.

So what needs to be considered to build that trust? Reliability and consistency certainly come into it. There are also issues of fairness, (emerging) regulatory requirements and model quality. These criteria are expanded on in the widget below:

The Status Quo - Interpretable Models

Here’s how things usually play out. An analyst or team is engaged to create a predictive model. If the business has little experience with statistical modelling, there will usually be a few iterations (with all kinds of ‘black box’ models attempted) before it is made clear to the analyst that the model needs to be explainable. More seasoned managers will state this up front. The analyst will then run a linear or logistic regression, or a decision tree. The model will take a hit on its accuracy, but oh well, at least the business can now understand it. Or can they?

Other interpretable models do exist - k-Nearest Neighbours; Naive Bayes, for example, but they still suffer from an accuracy deficit (generally) when compared with more sophisticated methods. The following graph displays this conundrum:

Accuracy / interpretability tradeoff. Note this graph is a guide only, results vary significantly over different data sets.

Of course, this graph is a guide only - in the real world results will vary depending on the data and the model parameters. But it is an empirical fact that as model complexity increases, interpretability suffers. Fortunately, the methods described in the next section go a long way to alleviating this condition.

Interpretability measures

In more complex models, there is rarely a monotonic relationship between the target variable and its predictors (i.e. x may not directly lead to y). However, several methods have emerged to shed light on the inner workings of even the most complicated models. The techniques described below constitute some of the most successful attempts to reveal the underlying machinery driving predictions.


Interpretability depends on the nature of the machine learning model. As we progress towards more complex models, interpretability becomes progressively difficult. However, there are now various ways to deal with this problem. The key thing to realise here is, as with everything in data science, there is no silver bullet. Each one of these methods has its pros and cons, and only with practice and understanding can one deploy them purposefully to truly open up the black box. Having said that, we at Verge Labs have endeavoured to make this easier by creating a utility that allows the user to input their data and model (or create a new one) and apply / explore many of these methods in a user-friendly manner. We also offer training and advisory services to provide you and your business with the expertise needed to really understand your model output. Please get in touch for more info.