Subscribe To Print Edition About The Tribune Code Of Ethics Download App Advertise with us Classifieds
search-icon-img
search-icon-img
Advertisement

Unreliable data inevitably leads to flawed conclusions

Appropriateness of the metric of a measurement is immensely important.
  • fb
  • twitter
  • whatsapp
  • whatsapp
featured-img featured-img
unsatisfactory: Data alone is insufficient for the purpose of statistical literacy. istock
Advertisement

IN his January 2023 address, then UK Prime Minister Rishi Sunak had stated that the country needed to reimagine its ‘approach to numeracy’ since we live “in a world where data is everywhere and statistics underpin every job.”

But should data always be trusted? In 2020, then Conservative UK PM Boris Johnson had stated that there were 1,00,000 fewer children living in poverty at the time than at the end of the previous Labour government’s tenure. However, Labour leader Sir Keir Starmer claimed that 6,00,000 more children were living in poverty under Conservative rule. Interestingly, the government’s statistics backed both claims, according to Georgina Sturge, statistician of the House of Commons Library. Thus, both were correct, but how?

Sturge’s 2022 book Bad Data: How Governments, Politicians and the Rest of Us Get Misled by Numbers shows how crucial data, including that from the government, is riddled with inconsistencies, guesswork and uncertainty. Data disasters from recent political history, including some of Brexit’s antecedents, are used to illustrate the book, which is upfront about the flaws and gaps in the data.

Advertisement

Sturge examines case studies of some of the most contentious topics, including gender disparity, immigration, Brexit, hate crimes, poverty and the state of education and healthcare. There is some dispute about what constitutes poverty, contributing to the discrepancy between Starmer’s and Johnson’s numbers. Sturge queries the definition of poverty. Is it a failure to provide for fundamental needs? And should television and access to the Internet be included among them? Also, should it cover the capacity, say, to pay an unexpected bill of a moderate amount?

Neither the unemployment data in the 1980s nor the crime statistics of the 1990s or 2000s for the UK were reliable. The UK government has altered the definition of unemployment and the method for registering and counting unemployed people numerous times over the past 50 years. Once the union insurance claims were added to the claimant count, it was then expanded to include anyone who was ‘actively seeking work’. Broader metrics are currently being used, and they account for those who are actively seeking employment as well as the larger group of people who are not, for various reasons. Sturge claims that nine ‘significant’ changes have been noted by the UK Office for National Statistics, making it impossible to compare the series over time.

Advertisement

Understandably, in such a scenario, it is difficult to assess whether the situation is getting better or worse, since we either do not count certain things or we do not count them consistently. According to Sturge, irregularities occur in the UK crime or health statistics since the same people who are responsible for reducing crime incident or disease numbers are responsible for recording those numbers. “We don’t know... how many people died from Covid-19 or whether crime is going up or down,” she wrote. Furthermore, disparities in the approach often distort the meaning. For instance, the data from telephone interviews used for the crime survey during the Covid-19 pandemic may not be comparable to the data from in-person interviews conducted before and after the pandemic.

Besides, if we don’t know the underlying narrative, data just can’t paint the whole picture. Sturge’s book has the lovely historical anecdote of the origin of the term ‘cobra effect’. Apparently, in 19th-century Delhi, a reward was offered for those who caught and killed the cobras that were overrunning the city. The public health risk posed by the venomous snakes, however, persisted even after locals reported wheelbarrow loads of dead cobras to the authorities. It eventually became evident that many people had chosen to breed cobras with the intention of killing them and claiming a reward, and a significant number of these farmed cobras were escaping and attacking people.

Again, only data is not enough; the appropriateness of the metric of a measurement is immensely important. For example, rather than revealing general gender differences, the gender pay gap highlights the under-representation of women in top posts at firms. So, while helpful, this statistic is not very nuanced. In his 2019 book Bad Data: Why We Measure the Wrong Things and Often Miss the Metrics That Matter, Canadian urban designer Peter Schryvers drew attention to the drawbacks of data analysis and stressed the need to apply appropriate metrics before making key decisions in the environment, corporate and healthcare sectors.

Economist Joseph Stiglitz, recipient of the 2001 Nobel Prize, once stated: “What we measure informs what we do. And if we’re measuring the wrong thing, we’re going to do the wrong thing.” Of course, there are tons of other examples beyond those two books on bad data. The GDP, which is frequently criticised as an inappropriate indicator of growth, is a crucial one. A small number of wealthy individuals contribute far too much to the GDP. Besides, there are more flaws in the GDP calculation process. In mid-2013, the US Bureau of Economic Advisers had modified its GDP calculation methodology, resulting in an overnight growth of 3 per cent in the US economy. Ghana moved its base year from 1993 to 2006 in 2010, resulting in a GDP growth of 60 per cent and its transformation from a low to a lower-middle-income country. Similarly, Nigeria’s GDP grew by 89 per cent all at once when this was ‘rebased’ in 2014; it surpassed South Africa to become Africa’s largest economy. And all that magic happened without any additional economic action.

Artificial intelligence (AI), too, is not always correct and unbiased. Bad training data or the intrinsic bias of the data used to train AI models is one of the primary reasons for that. Training data should be vetted to be free of racism, sexism and any other form of discrimination. However, it’s almost impossible to guarantee that, particularly since AI requires massive amounts of training data.

Therefore, statistical literacy is important for both our daily lives and politics, and data alone is insufficient for the purpose. However, the more general query still stands: Is it possible to draw any significant conclusions from data that seems flawed? Is no data preferable to bad data?

Advertisement
Advertisement
Advertisement
Advertisement
tlbr_img1 Home tlbr_img2 Opinion tlbr_img3 Classifieds tlbr_img4 Videos tlbr_img5 E-Paper