Using artificial intelligence to rule on handball is a tantalising possibility | Jonathan Wilson

Show caption

A shot from West Ham’s Ryan Fredericks hits Ezri Konsa of Aston Villa on the top of the arm but no handball is given. Photograph: Javier García/BPI/Shutterstock

Sportblog

Comparative judgment used in marking essays could improve decisions in football and help restore common sense

How should an essay be marked? You might think a teacher should simply read it and make a judgment based on the impression it makes: logically coherent, offers evidence to back up its case, reads well, is original – feels like an A. But that, obviously, is risky. It’s subjective. What stirs one assessor might not appeal to another.

So maybe there needs to be an agreed rubric. The essay must cover certain key points, achieve certain goals. But the danger then is that essays become box-ticking exercises, that a student could doggedly go through the checklist and achieve top marks despite making little sense: or a brilliant essay might omit one point and so be marked down.

As Daisy Christodoulou, director of education at No More Marking, points out, the debate between the spirit and the letter of the law is ancient, and has relevance to modern football. In Mark’s gospel, Christ and the disciples are criticised by the pharisees for breaching the prohibition on working on the sabbath by picking some heads of grain. To which Christ replies that the sabbath was made for man, not man for the sabbath. Christ implies that the specific laws are less important than the spirit that underlies them; for the pharisees the laws are what make religion: to rely on the spirit is to tolerate rule-breaking and self-indulgence.

In football, the two ends of the spectrum tend to be referred to as “consistency” and “common sense”. We – fans, journalists, players, managers – feel instinctively that an inconsequential and non-deliberate nudge on a player moving away from goal is too trivial to be punished with a penalty, a three-quarters chance of a goal, and so call for common sense, and yet we want that common sense somehow to be universal in application, for referees also to “feel” the situation as we do.

VAR has not changed that, but it has made the issue more fraught. Once inconsistencies could be written off as inevitable consequences of the pace of the game; we accepted, up to a point, that referees chasing along behind the play could not be expected to see everything and so accepted some latitude. But VAR extends the illusion of perfectibility. If we can see everything from a multitude of angles and slow it down, should we not be able to agree on a decision? Very clearly, we cannot.

Take handball, in some ways the simplest of laws. There is no issue of a level of acceptable force, or which player initiated contact, or whether something may conceivably have endangered an opponent – there is a ball, there is a hand and there is a question of intent. And yet since the introduction of VAR, football has tied itself in knots trying to decide what a handball is.

The VAR board announcing a check for handball during the Premier League match between Aston Villa and Newcastle United at Villa Park. Photograph: Serena Taylor/Newcastle United/Getty Images

As an example, take the handball for which Ivan Perisic was penalised in the 2018 World Cup final as a throw-in glanced off Paul Pogba, who was a foot in front of him, and on to his arm, which was very slightly extended from his body as he landed having jumped for a header. Not merely did Perisic have no time to react, he was unsighted. By the letter of the law, perhaps it was a handball, but it felt wrong, a game turned by the random bounce of a ball on to an opponent who had no way of avoiding it – and that in part explains the constant revisions to and extension of the law over the past few years.

The handball law used to comprise only 20 words, with three advisory bullet-points to help referees decide what might constitute “deliberate”. The law as it stands is 252 words long, none of it advisory, and also includes a diagram to explain the point at which the shoulder becomes the arm.

But adding text doesn’t necessarily clarify the issue; rather it risks adding more scope for interpretation. In marking, Christodoulou favours a process called comparative judgment, by which teachers are given a series of match-ups between a randomly selected pair of the essays under consideration. They judge which of each pair is better, without using a mark scheme. Each essay is judged several times in different match-ups by different assessors, and all the judgments are then combined to provide a mark and a rank order.

Quite apart from the practical benefits – the process is apparently much quicker than traditional marking – it simultaneously allows for the subjectivity of “knowing” what a good essay is, while at the same time making judgment less reliant on the individual view of a single teacher.

No More Marking has used this approach to assess half a million pieces of student writing over the past few years. Christodoulou sees the mental challenge of applying a mark scheme to an essay as similar to applying a rule book to a handball incident and believes comparative judgment could improve decisions in football, too.

Take a panel of stakeholders – referees, managers, players, journalists, fans – show them a series of pairs of handball incidents and ask them to judge which one is more deserving of a free-kick. A consensus would build: some incidents would obviously be handballs and some would obviously not, while others would be less clear-cut.

That would at the very least refine the discussion, using practical examples rather than convoluted verbal descriptions, and could then be used to amend the law and the way it is explained, not only to referees but to players and the public.

But what could revolutionise refereeing is what comes next. The technology is not yet sufficiently advanced but, before too long, with sufficient data, it is at least theoretically possible that footage of a handball decision could be examined not by a VAR official but by artificial intelligence, which could then access the previous incident most similar to the one under discussion and see whether a majority of the panel had decided that should be deemed an offence. There would then be consistency within the common-sense instincts of the panel, the letter of the law effectively being conditioned by the spirit of the law. We are not there yet, but the possibility is tantalising.