When working with data it is tempting to assume that because 2 variables are correlated that one variable causes the other. In fact behavioural psychologists would no doubt opine that our default setting, our autopilot human nature would be to automatically make this assumption. “Correlation proves causation” is known as a questionable cause logical fallacy or, as our Latin-loving Foreign Secretary Boris Johnson would no doubt refer to it, cum hoc ergo propter hoc.
The following nicely illustrates that they are not the same: ambulances are seen at the site most serious traffic accidents but the ambulances will unlikely themselves be the cause of the accident. So there is a high correlation between the appearance of ambulances and serious road traffic accidents but there is not causation.
Tyler Viglen, a Harvard Law student, military intelligence analyst and blogger, has written an algorithm to mine unrelated data sets and pair them based on the similarity of their curves – with hilarious results particularly if someone were to conflate correlation and causation. The divorce rate in Maine correlates very strongly (r=0.993, where r=1 is perfect correlation!) with the per capita consumption of margarine and the age of Miss America has strong correlation (r=0.87) with murders by steam, hot vapours or hot objects! But my personal favourite has to be:
Once you understand the difference these examples above are truly comedic. However, in nutrition and public health this is important and common given the propensity of usage of statistics. For example sugar consumption and obesity have very close correlation across the world – but perhaps without the “smoking gun” proof that sugar (alone) causes obesity (although Dr. Robert Lustig would argue differently). Understandably it is a highly charged claim and given the scientific principle of falsification there has been huge interest in the so-called “Australian Paradox” which, almost uniquely amongst countries where data is available, charts a recent divergence in trends between sugar consumption and obesity – as shown below.
It is the decline in sugar consumption but the continued rise in obesity since the year 2000 – or the break in correlation – that appears to debunk the causation argument worldwide. Now there is a growing school of thought that there is indeed no Australian Paradox and – shock horror – this may be the result of some tricksy behaviour by Big Business with regards to the data.
However, it is not for me to comment as the point of this particular blog is to illustrate the difference between causation and correlation. Instead I will leave the final word to the esteemed Marion Nestle, the Paulette Goddard Professor in the Department of Nutrition, Food Studies, and Public Health at New York University: