How the pandemic has complicated our hopes for big data
By Nick Ferris
Big data has been hyped as the solution to our problems for years now. Forbes declared in 2019 that we are living in the age of big data, while The Economist stated in 2017 that the commodity of data is “the new oil”. And as tech companies’ management of harvested data has become more sophisticated, so we have become used to Amazon knowing what we want to buy, Netflix knowing what we want to watch, and Google knowing what we want to find out.
We live in an age where the practical collection and processing of information on the ground is no longer necessary – all we need is to access enough data, and process it through the right algorithm. Or so we thought.
One of the many lessons the coronavirus pandemic has taught us is that we cannot yet invest all hopes in centralised big data solutions. Unlike an online purchase or entertainment selection, it is hard to translate healthcare into binary systems that digital methods rely on. A human body’s interaction with a virus can be messy and deceptive. What’s more, the medical systems that those humans function in are typically vast, bureaucratic and underfunded.
But this hasn’t stopped authorities dogmatically pursuing their data-led solutions.
At the start of the pandemic, the NHS teamed up with Palantir, a secretive Silicon Valley data-analysis firm that carries out digital processing of sensitive data for governments across the world. The aim of this partnership, according to the UK government, was to “provide a single source of truth about the rapidly evolving situation,” based on what is recorded at sources such as hospitals, social care providers, and NHS 111 call centres.
There are naturally concerns about the fact that we are witnessing the transferral of NHS data to a private US firm known for winning a number of controversial contracts covering predictive policing and migrant surveillance in the US. There are also fears that Palantir’s datastore could outlive the crisis: the New Statesman revealed in April that the company offered 45 engineers to work on a project for just £1 – sparking fears that the company believes it might eventually be able to profit from the deal.
Other tech firms including Amazon Web Services, Google and Faculty – a London-based AI specialist – were given a slice of the NHS’s Covid data pie, all with the promise that, once the public health emergency has ended, data collected would either be destroyed or returned.
But how useful have the NHS’s centralised data systems been? To understand, let us use one of the most fundamental values recorded by the health service as an indicator: infection and death rates.
100,000 tests per day
According to Public Health England, there have been 374,228 cases and 41,664 deaths in the UK as of 15 September. However, as has been widely reported up to now, these stats must be taken with a bucketful of salt. Even during a normal year, deaths caused by the flu can be misdiagnosed as pneumonia. If that is the situation with a well-known ailment, there are bound to be Covid deaths that go unreported.
To complicate matters further in the UK is the issue of our low testing capacity: at the start of the pandemic, very limited testing capacity meant that a lot of cases may have gone unnoticed. Similarly, deaths will have occurred and not been recorded as Covid deaths due to the patient never actually receiving a positive test result.
A Sky News investigation in July revealed just how chaotic the testing systems that data analysts relied upon was. Reporters uncovered hand-written tables of testing data during the height of the pandemic in mid-May, and told of how ministers used to call round to try and collect data to use before each daily press conference.
When Matt Hancock unveiled his pledge for 100,000 tests per day by the end of April, problems only increased as all efforts went on getting out as many tests as possible and not necessarily focusing on what the results of those tests were. According to Department of Health data, four million kits were sent out in a period between April and July, but 2.6 million of these tests were never processed – meaning around two in three of the total tests sent out became obsolete.
Increased incidents of domestic violence or increased prevalence of mental health problems also influence excess deaths
Because of a failure to manage effectively the virus on the ground, the big datasets collected overall are unreliable. The government’s great aims for big data have been foiled. But there is another data-driven way of trying to understand things that does not rely on human error-strewn methods: via excess deaths. This is how many more people have died over the time period of the pandemic than in that equivalent time period in an average year.
With recorded deaths from Covid-19 still at very low levels, excess deaths is not currently useful: it is only a reliable metric when recorded Covid deaths are high and we know that higher excess deaths will be due to that disease.
It is the Office for National Statistics, not Public Health England, that measures excess deaths: and between the week ending 13 March and the week ending 21 August, there were 58,000 more excess deaths in England and Wales than the 2015-19 average for that same five month period. If we are to include Scotland and Northern Ireland in the mix, it will be a few thousand more, making a safe estimate for excess deaths over the peak of the Covid-19 pandemic to be 60,000.
But while excess deaths give a better sense of the scale of the impact, they are inherently inaccurate. As a measurement it fails to distinguish between those who die of the disease and those who die from factors related to the pandemic, such as delays or disruptions to ordinary health services.
Visits to emergency departments in the United States fell by more than 40%, according to the Centers for Disease Control and Prevention, indicating just how significant an impact the pandemic might be having on people accessing healthcare. Other factors triggered by the pandemic, such as increased incidents of domestic violence or increased prevalence of mental health problems, can also influence excess deaths.
On the flip side, lockdowns and behavioural changes can also limit deaths from causes like other infectious diseases. The global surveillance system FluNet found that this year’s flu season was shortened by a month, likely due to strict lockdowns imposed worldwide.
Fighting pandemic with big data – a complicated picture
So what is the solution to understanding the pandemic? We cannot forsake data: the subject at hand, concerning the tracking of cases and deaths, is inherently data-driven.
What is clear, though, is that for a group of highly-trained computer engineers sitting in an office somewhere to try and effectively model solutions is a fool’s errand, unless enough is invested in traditional resources on the ground to make the data collected reliably comprehensive.
Admittedly, when it comes to excess deaths, little can be changed. This is essentially an unreliable metric, that is only really useful during the midst of a crisis like war or natural disaster. But when it comes to measured rates of cases and deaths, things can certainly be improved. The problem for the UK has been to develop centralised “big data” solutions, that follow models and solutions set out by organisations in the capital, with too little emphasis placed on localised tracing.
When comparing the UK’s performance with other liberal democracies, Germany appears to be the most appropriate model. From the get-go, the country has focused heavily on developing testing capabilities and track and trace systems that take advantage of the country’s network of 400 local health authorities, which have been doing contact-tracing for years.
Instead of trying to organise a brand new high tech solution, Chancellor Merkel and the prime ministers of each of the sixteen federal states – which are responsible for their own health systems – invested heavily in these traditional health authorities and bolstered manpower. Inexperienced new staff were embedded in existing organisational structures, helping limit some of the difficulties reported in England.
There have been around six times fewer deaths in Germany than the UK measured by excess deaths, despite a shorter and less intense lockdown than here. There are few stories, such as that which came out in The Sunday Times earlier in September, of testing chaos with samples being sent across Europe due the failure of processing in the UK.
A silver lining
Data-based solutions in medicine are certainly not finished. The start of September saw Matt Hancock announce a new £50 million investment in new AI products that can turn a patient’s smartphone into a medical-grade device for monitoring kidney disease.
This is an extremely exciting data-driven solution that will hopefully become more common as AI continues to develop.
But at the same time, progress for progress’ sake is never the answer if a more traditional solution can deliver better results, then new solutions are surely worth re-evaluating.
If one of the outcomes of the pandemic is that the UK government no longer blindly invests in broad, data-driven solutions and instead appreciates the importance of practical, localised systems when it comes to healthcare, then that could well be a silver lining to help government strategy in future crises.
Nick Ferris is an investigative journalist based in London