AI – unlocking the black box

Exposing the inner workings of machine learning allows us to take full advantage of artificial intelligence


It has been called the ‘dark heart’ of artificial intelligence (AI) – the complicated ‘black box’ of hidden machine learning algorithms that many would have us believe will allow AI to take our jobs and run our lives.

But before that can happen AI must be integrated into our everyday systems and protocols – including regulation. Product users and stakeholders must also have trust in AI and machine learning – otherwise they simply won’t use it.

New interpretability techniques are now making it possible to lift the lid on the black box.

Increased transparency equals more trust

Overcoming the “Why should I trust you?” scepticism about AI and machine learning is perhaps the biggest challenge that businesses need to master to gain trust from their stakeholders – customers, employees, shareholders, regulators and broader society.

This is particularly important in applications where predictions carry societal implications – for example, criminal justice, healthcare diagnostics, or financial lending. Transparency is a tool for detecting bias in machine learning models. Increased interpretability is also critical for meeting regulatory requirements such as the General Data Protection Regulation (GDPR) by making models auditable.

"Transparency helps us remove the fear factor associated with black boxes – and instils trust – something that all brands covet in today’s fast-changing business economy."

Making AI ethical is not the only advantage of introducing transparency: profitability goes hand-in-hand with a clearer view too. Providing explanations for predictions enables business users to provide feedback for model improvement. They can assess why a model might fail and thereby increase the robustness of the model. Increased trust in models’ recommendations will also fast-track adoption of machine learning models among decision-makers which will fundamentally change the way you do business and improve your P&L.

How to ‘explain’ a machine learning model

One way to help stakeholders understand why machine learning model made a certain recommendation is to use easily interpretable models, such as linear regression. However, this can limit us from using the full power of machine learning. The solution is to separate the explanations from the underlying machine learning model by using model-agnostic interpretation layer. This frees up data scientists to use any machine learning model (for example, random forest or deep neural networks) or a combination of models to give the best performance.


An intuitive approach for creating the interpretation layer is to build a surrogate model that approximates the predictions of the underlying black box model as closely as possible while being interpretable. Building a surrogate model requires no information about the inner workings of the black box, only the relationship between input and predicted output.

However, comprehending the entire black box model built with hundreds of features by using a simple interpretable surrogate model is difficult. The way to overcome this challenge is to zoom in on a single instance of prediction. This is called local interpretability.

Locally, the prediction could depend on just a handful of features, and so be easily interpretable, yielding a more accurate explanation. For example, the value of a second-hand car may depend non-linearly on the miles driven. Many other parameters such as the make, history, and condition of the car play a role. However, if you are looking only at a particular car, you could easily find how the value changes if the car has driven fewer or more miles.

Using ready-made, open-source frameworks

Your data scientists can use a few open-source algorithms to start breaking open the black box. Let us consider a couple of them.

LIME (Local Interpretable Model-agnostic Explanations) is a path-breaking algorithm that explains the predictions of machine learning. Designed at the University of Washington by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, LIME learns the behaviour of any black box model using local interpretable surrogate models (such as a linear model with only a few non-zero coefficients). LIME varies the input and sees how the predictions change. This helps LIME generate a new dataset consisting of input variations and the corresponding predictions of the black box model. LIME then trains a simple interpretable model using this dataset. The key challenge when using LIME is getting the correct definition of the local neighbourhood.

"Overcoming the “Why should I trust you?” scepticism about AI and machine learning is perhaps the biggest challenge that businesses need to master "

The other method that is gaining traction is SHAP (SHapley Additive exPlanations). SHAP explains the output of machine learning models by connecting cooperative game theory with local explanations. This method assumes that each feature is a ‘player’ in a game where the prediction is the payout. Players cooperate in a coalition and receive profit from this cooperation. The Shapley value tells us how to fairly distribute the ‘payout’ among the players.

Let us assume that your machine learning model is trained to predict the value of a residential apartment. Features such as area, floor number, proximity to the school, and the number of parking lots, are the players. The Shapley value for feature ‘second parking lot’ will be the average marginal contribution of the feature value over all possible coalitions.

Coalitions, in this case, will include all possible combinations of other features – for example, area, floor number, proximity to school, ‘area + floor number’, ‘floor number + proximity to the school, area’ + ‘proximity to the school, and area + floor number + proximity to school’.

For each of these coalitions, we compute the predicted property value with and without the value for feature ‘second parking lot’ and take the difference to get the marginal contribution. The Shapley value is the weighted average of marginal contributions. They key challenge when using SHAP is that it is compute hungry and you will need access to data, unlike for LIME.

What does the model explain?

Let us look at an example of the model explanation. The bar chart below explains the drivers of the valuation of a residential property. Features such as size and number of rooms enhance the value of the property whereas crime rate and condition of the property present a drag on the property value.



You need to decide how you want to present this explanation to your stakeholders. The kind of explanation needed by your customers could be different to what your business users or regulators need. The chart could be easily converted into a text explanation using Natural Language Generation techniques.

Making it happen

Introducing explainable AI does not require a significant upfront investment especially for decision tree based models. LIME and SHAP are open source frameworks. Your data scientists can easily access them on the GitHub platform – using the download links for LIME and SHAP.

In the long run, you will reap the rewards by making your AI development ‘explainable by design’ by embedding interpretability methods into your Data-to-Value life cycle and governance controls. Not all models need the same high standard of interpretability though. The degree of interpretability required is your policy decision based on your risk appetite and stakeholder needs.

Moreover, channelling explainable AI outputs into automated controls will secure your models against unforeseen risks because your risk teams will be alerted to the potential problems.

Model interpretability is still in its early stages, but nothing is stopping us to start benefiting from it now. Transparency helps us remove the fear factor associated with black boxes – and instils trust – something that all brands covet in today’s fast-changing business economy.

Comments (0)