The content of this website is intended for healthcare professionals only

Previous Posts

1 2 3  > 

Clever Hans 2.0

Still practising

Chris Preece

Tuesday, 27 November 2018

AdobeStock_165200624.jpgOn September 4th 1904 the New York Times ran an article about “Clever Hans”, a horse that could “do almost everything but talk”. Hans was able to perform mathematical calculations, identify musical tones, and spell out the names of individuals. A commission appointed by the German Board of Education found no evidence of trickery, and it was clear that Hans could perform his miraculous calculations, even if his owner was no longer in the room. His owner, Wilhelm Von Osten, maintained that any horse could be trained to perform similar feats – Clever Hans represented a sparkling new future for the humble horse.

Yet here we are, over 100 years later, and we’re still yet to have a single horse trained in accountancy, so what went wrong? Well, perhaps unsurprisingly, it turns out Hans wasn’t a genius after all. Instead Hans was just very good at reading people – he could tell from his questioner’s body language exactly when to stop tapping his hoof, and earn himself a treat. In assuming Hans was solving the problem just the way we would, his owner failed to realise that he had, instead, trained him to identify something entirely different.

I mention this because I’ve found myself thinking of that extraordinary horse a lot recently – not least since discovering the fascinating, if underwhelmingly titled paper “Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study” (Stay with me, it’s worth it, I promise.) The aim of the paper was to look at whether an Artificial Intelligence (more specifically, a “Convolutional Neural Network”) developed to interpret chest x-rays in one hospital, would be able to perform as well when faced with data from an external site.

The quick answer was no, it didn’t. However, more interestingly was the reason why. What became apparent was that, when dealing with its “usual” hospital, the AI was extremely competent at deducing which imaging equipment was used, and by extension which department. When the AI is unable to differentiate the origin of the picture, its accuracy reduces significantly. In other words, it looks like the AI may well be deducing a patient has pneumonia, not because their X-ray shows the classical signs, but because it knows the film in question just came from the portable machine in ITU. Instead of solving the problem just the way we would, the AI had, instead, trained itself to look at something entirely different. It’s Clever Hans.

The issue of course is that, just as “Berlin’s Wonderful Horse” caused press excitement back in 1904, so does Artificial Intelligence now. It’s slowly crept out of science fiction and into our lives. Articles breathlessly tell us about a glistening error free future, in which we can trust in the A.I. to run our lives. Doctors, we are told, may soon be obsolete. 

Certainly, that was the view some took when Babylon infamously rolled out their diagnostic AI for the Royal College of Physicians back in June, proudly declaring that it could score 81% on the MRCGP exam. (One can presume it would, however, have failed its clinical skills and workplace assessments.) Subsequently there was a stinging rebuke published in The Lancet, stating “Babylon’s study does not offer convincing evidence that its Babylon Diagnostic and Triage System can perform better than doctors in any realistic situation, and there is a possibility that it might perform significantly worse.” However, it seems unlikely that a carefully worded statement in a research paper is going to be sufficient to sweep away the avalanche of publicity created by that first event.

This newly found enthusiasm for letting computers run our lives extends to Government as well. Matt Hancock recently announced that the Government is investing £50 million into five AI development centres across the UK, intended to help with the early identification of disease. We can only hope that these fair better than IBM’s Watson system, which went from being hailed as the future after beating “Jeopardy!” contestants back in 2011, to this year facing accusations that not only did it add very little of value, but in some cases suggested treatments that were actively unsafe.

Of course, there are good reasons for the Government to pursue A.I. as a healthcare solution. Austerity means there’s little to invest in good old fashioned humans, and despite Mr Hancock’s bizarre statements to the contrary we still have a shortage of doctors. However the UN’s recent report on poverty and human rights in the UK (an essential, but depressing read) alludes to another, altogether more sinister motive. The report expresses clear concerns about the Government’s aspirations to use A.I., and its already worryingly opaque data about existing automated systems in the Department of Work and Pensions.  (The system may flag you as “high-risk” without your knowledge, with no right of reply, or indication of how you might do better.)

It doesn’t take a died-in-the-wool conspiracy theorist to see how these concerns could be extended to the use of A.I. in the NHS. Already as a GP I am bombarded with pressures to not refer, or to deny access to certain treatments based on dubiously ethical grounds such as BMI. Doctors are frequently disinclined to follow these rules when faced with individuals for whom they are clearly inappropriate – an artificial intelligence on the other hand, is unlikely to have such foibles. “The computer says no.” Indeed, even if such parameters are not introduced to an A.I. deliberately, artificial intelligence learns from the biases of its teachers. If the message it learns is that the budget is more important than the individual, then we may all have a problem.

What’s more, it won’t necessarily be obvious. The problem with AI is that we don’t really know how it’s learnt something, just that it has. We don’t really know what it’s looking at. We’re back to Clever Hans again.

Despite all of this, I’m not opposed to the idea of Artificial Intelligence per se. The notion that I could have a program on my PC that I could employ to help me hit on a diagnosis is vaguely appealing. It’s just that, like any other medical advance, it needs to be proven to be safe, properly understood, and used appropriately. With those caveats satisfied, I’ll be happy to use it alongside all the other tools of my trade.

That’s all it is. A tool. After all, hitting on the diagnosis is arguably the easiest bit of medicine. The hard stuff is all about the ability to read and understand other people, something A.I. remains a long way from mastering, but which, ironically, was the one thing Clever Hans did know how to do. A.I. might be clever, but it’s not as clever as Clever Hans.

And on that note, I’m off to sell Matt Hancock a horse.

Author's Image

Chris Preece

Chris has worked as a GP Partner in North Yorkshire since 2004, and still relishes the peculiar challenge of never quite knowing what the next person through the door is going to present with. He was the chair of his local Practice Based Commissioning Group, and when this evolved into a CCG he joined the Governing Body, ultimately leaving in April 2015. He continues to work with the CCG in an advisory capacity. When not being consumed by all things medical, Chris occupies himself by writing, gaming, and indulging the whims of his children. He has previously written and performed in a number of pantomimes and occupied the fourth plinth in Trafalgar Square. Tragically, his patients no longer tell him he looks too young to be a doctor.
Registered in England and Wales. Reg No. 2530185. c/o Wilmington plc, 5th Floor, 10 Whitechapel High Street, London E1 8QS. Reg No. 30158470