A British AI Tool to Predict Violent Crime Is Too Flawed to Use

A flagship artificial intelligence system designed to predict gun and knife violence in the UK before it happens had serious flaws that made it unusable, local police have admitted. The error led to large drops in accuracy, and the system was ultimately rejected by all of the experts reviewing it for ethical problems.

WIRED UK

This story originally appeared on WIRED UK.

The prediction system, known as Most Serious Violence (MSV), is part of the UK’s National Data Analytics Solution (NDAS) project. The Home Office has funded NDAS with at least £10 million ($13 million) during the past two years, with the aim to create machine learning systems that can be used across England and Wales.

As a result of the failure of MSV, police have stopped developing the prediction system in its current form. It has never been used for policing operations and has failed to get to a stage where it could be used. However, questions have also been raised around the violence tool’s potential to be biased toward minority groups and whether it would ever be useful for policing.

The MSV tool was designed to predict whether people would commit their first violent offense with a gun or knife in the next two years. People who had already come into contact with the two police forces involved in developing the tool, West Midlands Police and West Yorkshire police, were given risk scores. The higher the score, the more likely they would be to commit one of the crimes.

Historic data about 2.4 million people from the West Midlands database and 1.1 million from West Yorkshire was used in the development of the system, with data being pulled from crime and custody records, intelligence reports, and the Police National computer database.

But as NDAS was starting to “operationalize” the system earlier this year, problems struck. Documents published by the West Midlands’ Police Ethics Committee, which is responsible for scrutinizing NDAS work as well as the force’s own technical developments, reveal that the system contained a coding “flaw” that made it incapable of accurately predicting violence.

“A coding error was found in the definition of the training data set which has rendered the current problem statement of MSV unviable,” aN NDAS briefing published in March says. A spokesperson for NDAS says the error was a data ingestion problem that was discovered during the development process. No more specific information about the flaw has been disclosed. “It has proven unfeasible with data currently available to identify a point of intervention before a person commits their first MSV offense with a gun or knife with any degree of precision,” the NDAS briefing document states.

Before the error was found, NDAS claimed its system had accuracy, or precision levels, of up to 75 percent. Out of 100 people believed to be at high risk of committing serious violence with a gun or knife in the West Midlands, 54 of these people were predicted to carry out one of these crimes. For West Yorkshire, 74 people from 100 were predicted to commit serious violence with a gun or knife. “We now know the actual level of precision is significantly lower,” NDAS said in July.

“Rare events are much harder to predict than common events,” says Melissa Hamilton, a reader in law and criminal justice at the University of Surrey, who is focusing on police use of risk prediction tools. Hamilton wasn’t surprised there were accuracy issues. “While we know that risk tools don’t perform the same in different jurisdictions, I’ve never seen that big of a margin of difference—particularly when you talk about the same country,” Hamilton says, adding the original estimations appeared to be too high, based on other systems she had seen.

As a result of the flaw, NDAS reworked its violence prediction system and its results showed the significant accuracy drop. For serious violence with a gun or knife, the accuracy dropped to between 14 and 19 percent for West Midlands Police and nine to 18 percent for West Yorkshire. These rates were also similar whether the person had committed serious violence before or if it was going to be their first time.

NDAS found its reworked system to be most accurate when all of the initial criteria it had originally defined for the system—first-time offense, weapon type and weapon use—were removed. In short, the original performance had been overstated. In the best-case scenario the limited system could be accurate 25 to 38 percent of the time for West Midlands Police and 36 to 51 percent of the time for West Yorkshire Police.

The police’s proposal to take this system forward was unanimously refused. “There is insufficient information around how this model improves the current situation around decision making in preventing serious youth violence,” the ethics committee concluded in July as it rejected the proposal for the system to be further developed. The committee, which is a voluntary group consisting of experts from different fields, said it did not understand why the revised accuracy rates were sufficient and raised concerns about how the prediction system would be used.

“The committee has expressed these concerns previously on more than one occasion without sufficient clarity being provided, and therefore, as the project stands, it advises the project is discontinued,” the group said in its minutes. Committee members approached for this story said they were not authorized to speak on the record about the work.

Superintendent Nick Dale, the NDAS project lead, says those behind the project “agree that the model cannot proceed in its current form” and points out that it has so far been experimental. “We cannot say, with certainty, what the final model will look like, if indeed we are able to create a suitable model. All our work will be scrutinized by the ethics committee, and their deliberations will be published.”

But multiple people who have reviewed the published NDAS briefings and scrutiny of the violence prediction system by the ethics committee say accuracy issues are only one area of concern. They say the types of data being used are likely to end up with predictions being biased, they have concerns with the normalization of predictive policing technologies, and they cite a lack of evidence of the effectiveness of such tools. Many of these points are also reiterated in questions from the ethics committee to the NDAS staff working on the predictive systems.

“The core problem with the program goes past any issues of accuracy,” says Nuno Guerreiro de Sousa, a technologist at Privacy International. “Basing our arguments on inaccuracy is problematic, because the tech deficiencies are solvable through time. Even if the algorithm was set to be 100 percent accurate, there would still be bias in this system.”

The violence-prediction system identified “more than 20” indicators that were believed to be useful in assessing how risky a person’s future behavior could be. These include age, days since their first crime, connections to other people in the data used, how severe these crimes were, and the maximum number of mentions of “knife” in intelligence reports linked to them—location and ethnicity data were not included. Many of these factors, the presentation says, were weighted to give more prevalence to the newest data.

“There are a lot of categories which have been proven in other areas of data analysis in the criminal justice system to lead to unequal outcomes,” says Rashida Richardson, a visiting scholar at Rutgers Law School who has studied data problems in predictive policing. “When you use age, that often skews most predictions or outcomes in a system where you’re more likely to include a cohort of people who are younger as a result of age just being one of the indicators used.” Hamilton agrees. She explains that criminal history factors are often biased themselves, meaning any algorithms that are trained upon them will contain the same issues if a human does not intervene in the development.

“We monitor bias and would not seek to deploy a model that contains bias,” says Dale, the NDAS project lead. “We are committed to ensuring interventions as a result of any model of this type are positive, aimed at reducing criminality and improving life chances, rather than coercive or criminal justice outcomes.”

“The main value in MSV is in testing the art of what is possible in the development of these techniques for policing,” Dale adds. “In doing so, it is inevitable that we will try things for whatever reason, but we are confident that as we progress, we are developing data science techniques that will lead to more efficient and effective policing and better outcomes for all of our communities.”

The current thinking of NDAS is that the predictive violence tool could be used to “augment” existing decisionmaking processes used by police officers when investigating people who are likely to commit serious violence. The violence prediction tool is just one that is being worked on by NDAS. It is also using machine learning to detect modern slavery, the movement of firearms, and types of organized crime. Cressida Dick, the head of London’s Metropolitan Police, has previously said police should look to use “augmented intelligence” rather than relying on AI systems entirely.

However, issues of bias and potential racism within AI systems used for decisionmaking is not new. Just this week the Home Office suspended its visa application decisionmaking system, which used a person’s nationality as one piece of information that determined their immigration status, after allegations that it contained “entrenched racism”.

Last month, in the wake of the global Black Lives Matter protests, more than 1,400 mathematicians signed an open letter saying the field should stop working on the development of predictive policing algorithms. “If you look at most jurisdictions where there is some use of predictive analytics in the criminal justice sector, we don’t have evidence that any of these types of systems work, yet they are proliferating in use,” Richardson says.

Theses concerns are highlighted in the development of the violence prediction tool. Documents from the ethics committee show one unnamed member of the group saying the coding failure was a “stark reminder” about the risk of AI and tech within policing.

“In the worst-case scenario, inaccurate models could result in coercive or other sanctions against people for which there was no reasonable basis to have predicted their criminality—this risked harming young people’s/anyone’s lives despite the clear warnings—however, it is good to see the team having evaluated its own work and identifying flaws from which to start again,” they wrote in March.

Despite the flaw in the violence predicting system, those who have reviewed it say the setup is more transparent than other predictive policing developments. “The committee’s advice is transparent, robust, and has teeth,” says Tom McNeil, a strategic adviser to the West Midlands Police and Crime Commissioner. The fact that the ethics committee is asking pressing questions and getting answers is largely unheard of in the development of AI systems within policing—much of the development is usually done completely in secret with problems only emerging once they impact people in the real world.

“Just because something can be done computationally doesn’t necessarily mean that it’s always the best way to do it or that it should be done that way,” says Christine Rinik, the codirector of the Centre for Information Rights at the University of Winchester. “That’s why I think it’s so useful to have a process where these steps are questioned.”

This story originally appeared on WIRED UK.


More Great WIRED Stories

Read More

tianze.zhang@graduateinstitute.ch