Lawyer Challenge – The Results are in and the Machine has won

by Nico Kuhlmann

Last weekend, the results of CaseCrunch’s week-long competition to directly pit lawyers against artificial intelligence were announced. With an accuracy of 86.6%, compared to the lawyers’ accuracy of 62.3%, CaseCrunch emerged victorious.

The Format of the Competition

Throughout October, 112 lawyers pre-registered to participate in the Lawyer Challenge, which ran from 20th to 27th October. They were presented with factual scenarios of PPI misselling claims, and asked to predict “yes or no” as to whether the Financial Ombudsman would succeed in the claim. The same factual scenarios were given to CaseCrunch: whoever had the highest accuracy, won. 775 predictions were submitted by the participants.

A Technology Judge and a Legal Judge independently verified the fairness of the competition. The Legal Judge was Felix Steffek (LLM, PhD), University of Cambridge Lecturer in Law, and Co-Director for the Centre of Corporate and Commercial Law. The Technical Judge was Ian Dodd, UK Director of Premonition.

The factual scenarios were real decided cases from the Financial Ombudsman Service, published under the FOIA. All identifying details – such as the name of the parties, case names and dates – were removed, leaving only the facts. Lawyers completed their predictions in an unsupervised environment, and were permitted to use all available resources. PPI mis-selling was chosen as the basis of the competition because it matched the background of most lawyers taking part in the challenge and is an area of law that is easier to learn about than others. Participants were given links to the Financial Conduct Authority’s rules detailing the basis of an Ombudsman’s decision.

Felix Steffek attested: “The factual descriptions of the problems set by the Financial Ombudsman Service are a reasonable basis for a prediction about PPI mis-selling complaints being upheld or rejected by the Ombudsman at an early stage in the advisory process. Trained lawyers from commercial London law firms, using all the tools and resources they usually work with, are able to make reasonable predictions about these problems at this point even though the information given per claim varies and further information might be revealed at later stages.”

The Results of the Competition

Altogether, 112 Lawyers competed in the Challenge, ranging from Magic Circle Partners, barristers, and in-house counsel. Participating law firms include: Eversheds Sutherland, Pinsent Masons, Bird & Bird, Kennedys, Allen & Overy, Berwin Leighton Paisner, DLA Piper, DAC Beachcroft, DLA Piper, Weightmans and more, with some firms entering “teams” of lawyers into the competition.

The lawyers scored an accuracy score of 62.3%. CaseCruncher Alpha (the system entered into the competition by CaseCrunch) scored a validation accuracy of 86.6%.

Ian Dodd, UK Director of Premonition noted: “The session I observed produced an accuracy of 86.6%. It would also be interesting to put a £ value on the processing cost. The real number of: “Human: 62.3% at £300p/h and X hours” compared to “AI: 86.6% at £17p/h and X hours” is the true bottom line.”

Jozef Maruscak, Managing Director, says: “We could not be happier about the outcome. We are grateful to all involved parties, especially competing lawyers who were not afraid to participate. We are not necessarily adversaries in this game – systems like ours can make the legal world more effective for everyone. I am convinced that we have now reached the point where our technology and expertise allow us to satisfy both our vision and our commercial interests. We are looking forward to finding solutions for our clients. ”

Rebecca Agliolo, Marketing Director, commented: “Ultimately, the Challenge wasn’t about ‘winning or losing’; it was about showcasing the potential of artificial intelligence and changing the current paradigm not by talking, but by doing. The Lawyer Challenge started as an idea, and spiralled into a vision. Like any vision, it can’t belong to a single person. We hope that the Challenge will be replicated and improved – and we are proud to get the ball rolling.”

Scientific Director, Ludwig Bull, noted: “Evaluating these results is tricky. These results do not mean that machines are generally better at predicting outcomes than human lawyers. These results show that if the question is defined precisely, machines are able to compete with and sometimes outperform human lawyers. The use case for these systems is clear. Legal decision prediction systems like ours can solve legal bottlenecks within organisations permanently and reliably.”