Explainable AI — All you need to know. The what, how, why of explainable AI

13 min readAug 17, 2020


Explainable AI solves the black box problem. This is the idea that some, in particular machine learning-based, AI systems are black boxes. We can see what goes into the blackbox, e.g. a photograph of a cat, and what comes out of the AI system, e.g. a labelled version of the photograph, but we don’t easily understand how and why the system made that determination.

AI blackbox
Lawtomated employee, Baloo, being classified by an AI image classifer.

What is explainable AI?

The aim of explainable AI is to crate a suite of machine learning techniques that:

  1. Produce more explainable models, i.e. we understand how and why the system achieves its outcome given an input.
  2. Enable human users to understand, appropriately trust and effective manage AI systems.

This is best understood by a comparison between most of today’s systems and the explainable AI systems:

DARPA Explainable AI Diagram
Source: here

As you can see, the “Tomorrow” view is a lot more helpful to the human with the task. They are able to understand how and why the system goes from inputs to outputs.

Why is explainable AI helpful?

Explainable AI systems don’t simply provide an output, which may or may not be useful to a human, e.g. “Cat” or “Not Cat” in response to a given photo. Instead, explainable AI systems provide an interface that supplies additional information or presents data inherent to the inner workings of the machine learning system, which in either case helps the human understand how and why that decision was arrived at by the AI.

In doing so explainable AI systems help us interpret how and why they behave, helping us better consider their outputs and provide us with auditing capabilities to better interrogate a system if it goes wrong.

Why is explainable AI necessary? Helicopters vs Guns

Because today’s AI systems rely heavily on processing data in order to determine the necessary patterns or rules that take an input x and perform some outcome y, they are themselves vulnerable to data manipulations. These manipulations, called adversarial attacks, can trick seemingly intelligent AI systems.

Image classifiers, a type of AI to classify an image (e.g. like the “Cat” or “Not cat” example above) are susceptible to this issue. Image classifiers analyse the pixel combinations and strengths (in terms of colour values) to determine what label relates to the overall image.

To hack an image classifier, a hacker can manipulate a single pixel in an image (undetectable to the human eye). If that pixel is mathematically significant to the AI’s mathematical understanding of how pixels relate to labels, then the AI may be misled into mislabelling the image.

Adversarial Attack Gun vs Helicopter
Source: here. © Wired

As this excellent Wired article explains, the process of finding the exact pixel manipulations to hack the classifer can be automated. Essentially you try to fool the AI by systematically adjusting pixel values, analyzing the impact on the classification and inteligently inching their pixel changes toward label classifications that were increasingly inaccurate vs. the actual label. In this way they convinced an AI image classifer that a photo of a machine gun was in fact a helicopter.

Assassination by pixel

Sounds trivial right? Wrong. Imagine the image classifier is designed to steer an autonomous vehicle. Imagine further that car was carrying the president or prime minister of a country. Suppose a hacker replaces the stop sign at a junction with a subtly adjusted version. To any human, it’s a stop sign. But to the neural network it sees a Go sign because of an undetectable (to the human eye) exploit in that sign’s presentation.

Trolling Teslas

Sounds like futuristic nonsense right? Also wrong. Security researchers from Chinese tech company Tencent did just this to spoof a Tesla autopilot system, causing it to swerve into the wrong lane against its designed behaviour. How did they do it? They used a few small stickers on the road surface to exploit a vulnerability in the autopilots computer vision system. See also this fascinating article on these ideas and similar experiments adjusting Stop signs to create unintended behaviour with computer vision systems.

Rik & Morty

This idea is also put to comedic effect in Season 4, Episode 2 of Rik & Morty. In that episode, titular character Rik is about to be shot by some robots before he spins his tin hat around to reveal a QR code, which tricks the robots into understanding he is their commanding officer, a conceit he exploits to help them win a war.

Rick & Morty Robot QR Code
© Adult Swim

Explainability depends on complexity

In general:

  • More complex AI systems are more accurate but less explainable
  • Less complex models are less accurate but more explainable
Complexity vs Explainability of AI

What does complexity mean?

Complexity is a function of three concepts in machine learning systems:

1. Dimensionality: the more features (i.e. variables that the system analyses to product a related output), the more dimensions the mathematical model has to consider, making it trickier to fit an AI model to data.

2. Linear vs Non-linear: whether there is direct proportionality between input and output, e.g. y = 2x where y is always twice x (linear) vs. not, e.g. y = x2 where y is not directly proportional to x by some identical amount (non-linear).

3. Monotocity: whether the relationship between an input feature is always in one direction (i.e. always increases or decreases), in which case it is monotonic, else it is non-monotonic.

For example, the below comparison by Patrick Hall of two AI models that map a customer’s age to their predicted number of purchases:

Explainability of two AI models

In the first example we have a linear monotonic two dimensional model, g(x) = 0.8x. In other words for a one unit increase in age (the input feature x), the number of purchases (the output y) increases by 0.8 on average.

The model’s behaviour is identical across the feature space, i.e. for any age the number of purchases is always predicted as 0.8 x person’s age. Thus the model is globally (across the entire dataset) and locally (for a specific example) interpretable and therefore easy for us to explain how and why it determines the number of purchases someone will make based on age alone given any age.

However, it is less accurate. The model (the line of best fit) does not closely map the relationship of age to number of purchases, resulting in lost profits or wasted marketing!

In the second example we have a non-linear non-monotonic two dimensional model, g(x) = f(x) (i.e. the algebraic function is a higher order polynomial).

Although more accurate, i.e. the line of best fit more or less exactly maps age to number of purchases, it becomes harder to interpret the model globally — i.e. for any given age we can no longer simply say the relationship of age to number of purchases is always 0.8 x person’s age… instead it will depend on local interpretability, i.e. the particular relationship between age and number of purchases for a particular age or age grouping. As the data demonstrates, the relationship is non-linear and non-monotonic.

Imagine adding more features (and thus dimensions), e.g. previous purchasing decisions, interests, health, wealth, socio-economic background, education etc, and you can appreciate how model complexity becomes more accurate, yet less explainable as the mathematical model for relationships of inputs to outputs become harder to map!

The inequality of explainability

As described in DARPA’s research, not all AI techniques are created equal in terms of their capacity for explainability. For example:

Machine learning techniques and their explainability
Source: here

As you can see, more complex AI systems (e.g. neural networks) score high for performance but low for explainability whereas decision trees score lower for performance but higher for explainability. This is not to say the explainability of complex models cannot be overcome, but is a limitation of these systems that needs careful consideration.

But do all AI systems need to be explainable?

Balance explainability vs complexity, guided by criticality

In general, the more impactful an AI system in terms of the decisions / data for the decisions it generates, the more explainable it needs to be.

Impact vs Explainability for Explainable AI

Other factors designers and users of AI should consider in determining the level of explainability required include the below:

Explainable AI Key Concerns

So to summarise, the relationship of explainability, complexity (accuracy) and criticality is thus:

Criticality, Explainability and Complexity Mapping

Weighing together these factors results in a determination of necessary explainability, i.e. how much explainability is necessary in light of what the AI is required to and the environment in which it operates.

How to evaluate a system’s necessary explainability?

Do the following, perform an explainable AI gap analysis:

Explainable AI Gap Analysis
  1. Work with the vendor (if a bought solution) or your own IT team (if a bought or built solution) to understand the extent to which the proposed / existing system allows users to explain how and why an output is produced (“System’s Explainability”).
  2. Perform a gap analysis of the System’s Explainability vs. Necessary Explainability (previous slide).
  3. If System’s Explainability < Necessary Explainability = negative gap. You should consider customising the solution (if possible) or buying / building something more explainable.
  4. If System’s Explainability > Necessary Explainability = positive gap. You could consider increasing the complexity of the model to improve its accuracy to the extent it does not reduce Explainability below the Necessary Explainability.
  5. In either scenario you will need to work with your IT team (to the extent system is built or bought) and potentially also a vendor (to the extent the system is bought) to understand to what extent it is possible to tweak an existing system, or design a new one, to patch the identified gaps. Likewise you may need to re-engineer the surrounding people and process to reduce or eliminate such gaps.
  6. The above should be carried out at the outset of any build / buy discussion and throughout the system’s implementation to ensure no gap exists at the outset nor emerges over time.

For an intuitive mapping of explainability to criticality across different AI use cases, see this mapping by Everest Group:

Explainable AI mapped to criticality of decision and required explainability
Source: here

But not everyone is agreed. Explainable AI may be unnecessary

Despite increasing research into explainable AI, the AI community does not agree on whether or not it’s an important field of study.

Arguments for: testing = reliability = trustability

For example, Facebook’s Chief AI Scientist Yann LeCun, suggests rigorous testing is enough to provide an explanation. According to LeCunn, you can infer a model’s reasoning by observing how the model acted in many different situations. Google’s Chief Decision Intelligence Engineer, Cassie Kozyrkov, has made a similar case for why explainable AI will not deliver and supports thorough testing as an alternative.

Arguments against: reliability is not trustability, and we need more than trust

But many researchers dispute these arguments. Microsoft Research’s Rich Caruana suggests explainable AI is important, especially for sensitive applications, e.g. healthcare, law and so on.

Proponents on this side of the argument posit that those against explainable AI wrongly equate reliability (through testing) with trustability. If the model is predictable and broadly reliable, there’s no need for explanation.

Attractive as that is, those in favour of explainable AI point to examples such as self-driving cars where the dynamic environments in which cars operate create too huge a problem space of possible driving scenarios in which to comprehensively test to a point where the distinction between reliability and trustability is.

A further argument for explainable AI is that we are not simply concerned with trust (nor reliability).

We also want to ensure other behaviours, e.g. regulatory compliance, legal compliance, detecting (and ameliorating bias) and protections against adversarial techniques such as those explained above.

Examples of where explainable AI would have helped

Sexist CV screening

For instance, a system built by Amazon to screen job applications was scrapped after it consistently downgraded female applicants vs male applicants. The system was trained on historical data overpopulated by male candidates. As a result, the system was biassed against women. The most interesting thing was that even after gender identifying features were removed (e.g. name and gender), the system still performed the same. The system had identified that male candidates tended to use certain words or phrases that female candidates did not. Roughly speaking male candidates used more confident and authoritative language to describe their achievements whereas female applicants used more circumspect language regarding their abilities.

Racist loans

Similarly, a 2018 UC Berkley study concluded that traditional (face to face) and machine based systems for approving loan applications charged Latin and African American borrowers interest rates 6–9 basis points higher than an equivalent Caucasian borrower. Not only is this morally wrong, making determinations seemingly linked to skin colour (all other criteria being equal), but illegal. In this instance, there is a need to be morally fair but also legally compliant (or risk fees for lending discrimination).

A national political scandal

Or more recently, a relatively simple algorithm designed to predict UK students’ A-Level (age 18 school exams) and GCSE (age 16 school exams) results in lieu of being able to sit their exams due to COVID-19 met with disaster. The algorithm was heavily biassed against state school (i.e. free school) students vs private school students (i.e. fee-paying schools).

Washington Post - UK Exam Algorithm Scandal
Source: here

The algorithm worked as follows:

  1. Teachers were asked to supply for each pupil: (a) an estimated grade + (b) a ranking compared with every other pupil in their class.
  2. These were put into an algorithm that factored in the school’s performances in each subject over the previous 3 years.

However, this introduces a significant amount of bias. First, private schools historically outperform state schools on exams. This is because they are better funded, selective and typically attract students from highly educated and wealthy families. State schools are less well funded and attract far more poorer students from less educated families on average. As a result, if you are a top performer (an outlier) in a state school, your grade would have been normalised versus the previous three year track record for your school. If that track record is overall high (as it tends to be with private schools) no issue. If however, that track record is overall lower (as it tends to be with state schools), an above average student might be disproportionately downgraded.

Second, factor (1) above — the predicted grades — were given greater weighting by the algorithm if the class size for that subject was fewer than 15. As state school class sizes average around 20–30 students whereas private school classes are circa 5–10 students, private school students’ predicted grades were disproportionately emphasized in the algorithms calculation.

The result? Millions of students with downgraded results, potentially unfairly. In many cases this has cost students university or college places, and undoubtedly spoilt their futures. After weeks of scandal, the UK government finally admitted it had it wrong, dispensing with the algorithm’s results altogether, allowing students and universities / colleges to rely on predicted grades.

This was despite repeated pronouncements, only four days before this u-turn, including by the UK’s Prime Minister, Boris Johnson, that:

Let’s be in no doubt about it, the exam results that we’ve got today are robust, they’re good, they’re dependable for employers”

Boris Johnson, UK Prime Minister and Serial Liar

However, as much as there is a need for explainable AI (an opinion with which we agree), there is a certain sense of irony that AI systems are held to superhuman standards whereas humans are… not.

The explainable AI no one is talking about

A final counter-argument — often made partly in jest, partly in all seriousness — to the need for explainable AI is the lack of explainability regarding human decision making.

How easy is it to explain how and why humans in an organisation do X vs. Y without significant subjectivity?

Human post-hoc explanations of their own behaviour can be grossly inaccurate, with subconscious processes creating narratives to support a positive sense of self vs. objective facts.

This can bleed over into AI, e.g. unconscious human biases in datasets and system design.

Should we also insist upon another type of XAI: Explainable Actual Intelligence? Probably! Especially as those same humans are responsible for designing, safeguarding and improving AI systems.

How to do explainable AI

Thankfully, cleverer people than us are working on this problem. They’ve begun devising lots of new techniques designed to make AI systems more explainable. Examples include the below, organised by model type:

Explainable AI Techniques

An example GAN, as described above:

Source: here

The law of explainability

Under the the GDPR:

“a decision based solely on automated processing, including profiling, which produces legal effects concerning him or her or similarly significantly affects him or her”

is only permissible in certain conditions whereby:

  1. The affected individual must have provided explicit consent, which has a high threshold under the GDPR.
  2. The automated decision making is necessary for the performance of the contract, e.g. a credit check against a maintained set of databases.
  3. Authorised by law.

Provided these conditions are present, the affected individual is entitled to a “right of explanation” and a right to have human re-evaluation of the decision. When implementing such systems you must inform affected individuals about:

  1. The fact of automated decision making;
  2. The significance of the automated decision making; and
  3. How the automated decision making operates, which has been described as a “right to explainability”.

To comply you must provide “meaningful information about the logic involved”. Certain AI techniques make this challenging to the extent the algorithms and processing of data is opaque. Whilst regulators recognise these challenges, guidance suggests you will need to provide a full description of the data used plus the aims of the processing and counterfactual scenarios as an alternative.

And surprise, surprise, this very potential legal action was heavily discussed with regard to the UK’s A-Level and GCSE exam algorithm scandal discussed above!

Originally published at lawtomated.




Legaltech Deep Dives | Legaltech Leaders | Legaltech Coding