Algorithmic ethics: lessons and limitations for leaders

To unleash automation’s decision-making potential we must examine its limitations


In the not-too-distant past, we made decisions based entirely on human judgement. Now, automated systems are helping people call important shots. Financial institutions use algorithms to offer loan applicants immediate yes-no decisions. Recruitment firms adopt systems powered by language technology to match applicants to vacancies. Even the criminal justice system uses predictive algorithms when sentencing criminals.

There are hundreds of examples like these, and algorithmic automation of decision-making is only set to rise. Why? First, computational power is becoming cheaper thanks to Moore’s “Law” – the observation that the number of transistors per square inch in integrated chips, a measure that correlates with computation power, doubles every 18 months. Second, we’re creating smarter algorithms that can transform raw and unstructured data into digestible information that is impacting everything from digital to financial health. Third, we simply have more data. Every aspect of our lives is blazing a digital trail which can be mined to better understand human behaviour – and predict the future.

The opportunity

Some people argue that algorithms can never match human ability in making decisions because they focus too narrowly on specific tasks. But are humans so perfect? People can be influenced by how they feel: one study of more than a thousand court decisions showed that judges are more lenient after lunch. People can be slow: JP Morgan Chase cut 360,000 hours of routine finance work to a matter of seconds with a system that stopped 12,000 mistakes made by human error every year. People can be selfish: research led by Madan Pillutla, London Business School Term Chair Professor of Organisational Behaviour at London Business School, suggests that even the fairest, most well-intentioned person can be prone to discriminate – they hire based on what’s in it for them.

Three limitations

It’s true that algorithms do not suffer from human imperfections, such as being tired, error-prone or selfish. But there are limitations when we rely on algorithms to make decisions, and it’s important we understand what they are:

1. Transparency: algorithms are a black box – it’s difficult to know if they’re fit for purpose if we don’t know how they work
2. Bias: algorithms are trained to make recommendations based on data that’s not always representative – systematic biases can go unnoticed and these biases can proliferate over time
3. Accuracy: we treat algorithms as infallible – in reality, they’re only designed to work well on average.

An example from the criminal justice system can help us examine these limitations in action. Eric Loomis was arrested in the US in February 2013 and was accused of driving a car that had been used in a shooting: he pleaded guilty to eluding a police officer. The judge at Wisconsin Supreme Court sentenced him to six years in prison. Part of the decision-making process was COMPAS (which stands for Correctional Offender Management Profiling for Alternative Sanctions), a proprietary algorithm used to calculate the likelihood that someone will reoffend. COMPAS measures categories such as criminal personality, substance abuse and social isolation. Defendants are ranked from 1 (low risk) to 10 (high risk) in each category.

Loomis challenged the judge’s use of the COMPAS score because unlike other evidence used against him, his defence team could not scrutinise the algorithm – the first limitation. The judge’s reliance on the score, the factors considered and the weight given to data in the decision-making process were all grounds for his appeal, which reached the Wisconsin Supreme Court in July 2016.

The second limitation, that algorithms are only as good as the data used to train them, is also an issue. COMPAS uses crime data, which is essentially arrest data. Arrest data often relies on police being in the right place at the right time. If an area is notorious for petty crime, police are more likely to attend, make an arrest and report the data. The information is then used to predict future crimes. The problem? At some point, this becomes self-perpetuating. What if a crime goes unreported because a neighbourhood has a reputation for upholding the law? What if the police simply aren’t able to attend? No police, no arrests, no data.

Our human biases become part of the technology we create

My research with Yiangos Papanastasiou (Berkeley) and Kostas Bimpikis (Stanford) demonstrates that the world of recommendations based on consumer reviews also suffers from such self-perpetuating cycles. For example, a positive customer review of a hotel on TripAdvisor will lead to more people wanting to visit the establishment. More customers, more reviews, more social proof. In this way, a hotel’s success can become self-reinforcing. But what about a less-explored alternative option? Even a superior hotel will find it difficult to prove its worth if it receives less attention.

Self-perpetuating cycles can create bias in algorithms. Sometimes what leads to bias is much simpler. Imagine you were trying to teach a computer about facial recognition by taking your own set of Facebook friends as a starting point. Computers find patterns in data; that’s how machine learning works. Problem is, your friends’ faces don’t represent everyone on earth. Unintentionally, by starting from information based on your personal network, you will create an algorithm that recognises faces of people that look like your friends but fails to recognise faces that look different to them. The same can be said for data that is only representative of the past but not of the present or the future. In this way, our human biases become part of the technology we create. Tech giants such as Google are beginning to educate people about such bias, but still have a way to go.

Finally, even if the data used to train an algorithm is free from the biases described, algorithms may still prove problematic. Why? Because users rely on the recommendations too much.

First, we tend to forget that predictions are not always perfect. Going back to the COMPAS example, an independent audit by an investigative journalism non-profit firm ProPublica finds that defendants who rate with the worst possible score (10 out of 10) still have a 20% chance of not reoffending. Conversely, defendants who rate with the best possible score (1 out of 10) still have a 25% probability of reoffending (much higher than the general public). So algorithms can make mistakes.

There’s a mathematical limit to how fair any algorithm can be

Second, algorithms inevitably make predictions that discriminate against groups that are different from the average. Remember, COMPAS rates defendants based on more than 100 factors such as criminal history, sex and age. It doesn’t take race into consideration. Indeed, when mathematically scrutinised, it appears that the algorithm does not discriminate by race – someone classified as high risk has the same probability of reoffending irrespective of race. So the algorithm’s prediction accuracy does not vary by race, yet, when further scrutinised, the results are imbalanced. A greater share of black defendants (a minority with a higher recidivism rate than average) who do not reoffend are classified as high risk compared to the average. Given the former, the latter is inevitable: there’s a mathematical limit to how fair any algorithm can be.

How so? Sam Corbett-Davies, Emma Pierson, Sharad Goel (all at Stanford) and Avi Feller (at Berkeley), explain further in this article. Black defendants are more likely to be classified as high risk. Yes, race isn’t part of the data used, but factors that predict reoffending vary by race. For example, if previous convictions and propensity to reoffend are correlated, and if more black defendants have previous convictions, then an algorithm that uses previous convictions to calculate risk will rate black defendants as higher risk than white defendants. The point? If a greater proportion of black defendants are classified as high risk, and if high-risk classification is not always perfect, then a greater share of black defendants who do not reoffend will be classified as high risk. And this is despite the algorithm’s prediction accuracy being independent of race. This is far from fair, and it’s mathematically inevitable.

Four lessons for people (not machines)

It’s true that algorithms can generate biased outcomes that, if left unchecked, can amplify over time. But they do the job they’ve been designed to do. If we crave objectivity and consistency, let’s put the onus back on people to improve the design, use and audit of algorithms.

1. Use algorithms as one part of the decision-making process

Stanislav Petrov, who passed away last May, was a lieutenant colonel in the Soviet Union's Air Defense Forces, and to some he’s known as the man who saved the world. On 26 September 1983, an algorithm using satellite data told Petrov that the US had launched five nuclear-armed intercontinental ballistic missiles. In one interview, he said, “The siren howled, but I just sat there for a few seconds, staring at the big, back-lit, red screen with the word 'launch' on it.” His gut feeling at the time was that something wasn’t right; if he was to expect all-out nuclear war, why were so few missiles being launched? He went against protocol and didn’t press the “button”. It turned out that there had been a computer malfunction and by not escalating the alarm he avoided a disastrous nuclear war.

The missile-detection system was surely helpful to the Soviet (and NATO) military. But as Petrov helped design it, he knew the limitations. With great consideration, he decided to rely on his own judgement more than the information gleaned from an intelligent computer.

Beyond missile defence systems, it’s important to recognise that algorithms can make mistakes. Whenever we design decision tools that use algorithms, we have the responsibility to create processes that empower people to still apply their judgement and common sense.

2. Educate users on the limitations of algorithms

As Petrov’s example shows, algorithms are not a silver bullet (or silver missile for that matter). Calculations are based on the average; they offer probability, not certainty. While Petrov understood this, it’s not clear that most algorithm users – programmers, business leaders, policymakers – do. Certainly, more can be done to educate people on the responsible and effective use of algorithms through formal education and on-the-job training.

Moreover, algorithmic recommendations must be presented in such a way that makes the limitations lucid. For example, one of COMPAS’ inadequacies is that it rates defendants on a 1–10 scale. If instead the results were presented as a percentage it would reflect probability, chances and odds rather than a fait accompli. What’s more, being scored 10 out of 10 suggests that a person is 10 times more likely to reoffend than someone receiving the best possible score of 1 out of 10. This is not the case. There is a greater probability, but it’s actually only 3.8 times higher.

3. Prepare for the audit of algorithms

In the way that manufacturing and service firms expect auditors to come knocking, so too should firms producing and deploying algorithms. In 2015 in the US, the Federal Trade Commission, which protects American consumers, created the Office of Technology Research and Investigation. The initiative was launched to help ensure that customers enjoy the benefits of technological progress without suffering from the risk of unfair practices. As a trusted information source, the office conducts independent studies, evaluates new marketing practices and provides guidance to consumers, businesses and policymakers.

4. Protect your customers and employees by offering the right of human review

A new, stricter European data act is coming into force in May 2018: the General Data Protection Regulation (GDPR). It will build on the existing data protection regulation act and force firms to better handle data. One job of the GDPR is to safeguard people against the risk that a potentially damaging decision is taken without human intervention. So, for instance, if you’re rejected for a loan because your online application was declined automatically by an algorithm, you have the right for human review. The credit provider must then go through manual underwriting processes and check the original decision.

This signals a seismic shift and it’s a topic receiving attention worldwide, such as from German chancellor Angela Merkel who calls for “less secrecy” and more “informed citizens”.

The bigger picture

Let’s rewind to Loomis, who pleaded guilty to fleeing from an officer in 2013. After Loomis appealed his sentence, the case was referred to the Wisconsin Supreme Court. The attorney general’s office defended the use of COMPAS. In July 2016 the Wisconsin Supreme Court ruled that the algorithm can be used as one component of the decision-making process to aid sentencing decisions.

The court challenged the judges to understand COMPAS’s limitations to prevent misuse of the system; knowing what your algorithm can tell you is as important as knowing what it can’t. Petrov, whose actions avoided World War III, taught us the same: despite the algorithm’s big flashing press-me button, he didn’t. Petrov made the ultimate judgement call. The challenge for everyone affected by algorithms now is to ensure that we push the debate – the lessons, the limitations, the opportunities – on algorithmic ethics forward.

So when should we rely on algorithms? When mistakes can be observed and feedback is reliable, fast and actionable.

When should we rely on algorithms?

A feedback loop is a powerful mechanism that allows developers to continuously improve the design of decision-making algorithms. Take self-driving cars. When a self-driving car makes a mistake and causes an accident or violates a traffic law, the algorithm designers know about it. With that knowledge, they can scrutinise the sequence of events that led to the error – lane position, road conditions, acceleration – and can redesign the algorithm to avoid repeating the same mistake in the future. Arguably, sentencing convicted criminals is not an exercise privy to such feedback loops. There’s a long time lag between making a decision (sentencing a convicted criminal) and discovering whether the outcome was correct (whether or not they reoffended), especially if the sentence is lengthy.

So when should we rely on algorithms? When mistakes can be observed and feedback is reliable, fast and actionable. For everything else? Circle back to the tenets in this piece: use the algorithm as one part of the decision-making process, understand the algorithm’s limitations and present algorithmic output in a way that reflects these limitations, be prepared to audit and promote the right for human review. Place value on human judgement and a test-and-learn approach.

ESR mobile

Executing Strategy for Results

Translate your strategy into tangible results by developing a complete toolkit for successful execution, while also building long-term resilience across your organisation.


Comments (3)

aprin 5 years, 11 months and 13 days ago

(another time) I was writing: the scores are not ordered by reoffending rate, or even it is not possible that all reoffending rates are higher than in 'the general public'

aprin 5 years, 11 months and 13 days ago

Sorry, my keyboard worked without my permission.

aprin 5 years, 11 months and 13 days ago

defendants who rate with the worst possible score (10 out of 10) still have a 20% chance of not reoffending. Conversely, defendants who rate with the best possible score (1 out of 10) still have a 25% probability of reoffending (much higher than the general public).