Correlation does not equal causation. Or, does it?

Of all the branches of mathematics that most directly touch our daily life, statistics is likely the worst understood by laymen. We all have heard jokes like “you can twist numbers to prove pretty much anything”, or the infamous “according to statistics, if you eat a chicken and I starve, we both had half a chicken each”. Already such simple concepts as relative and absolute risk are sold by the media in such a way that make it difficult for the average person to get a correct idea. People often say “correlation does not equal causation”, which means essentially that, just because two things happen next to each other frequently, it doesn’t mean that one causes the other. But sometimes we take it too far, and act like correlation doesn’t mean anything at all. As I did already several times in these ‘fun-physics’ letters, I also suspect that our teaching may have some responsibility, and that we scientists are not always immune from problems and concerns that may end up in involuntarily (…I hope) misleading the public opinion.

A few years ago the Swiss doctor Franz Messerli, then at the Columbia University St. Luke’s Hospital, published an article titled “Chocolate consumption, cognitive function, and Nobel laureates” [N Engl J Med 367 (2012) 1562]. Apparently he did the study just for fun, but he ended up discovering a sensible correlation between countries consuming more chocolate (which is known to contain flavanols, a class of flavonoids supposed to help blood flow to brain cells) and the number of Nobel prizes that country has obtained. The paper received to date more than 300 citations, and got quite some media attention, while also flurrying additional research. Dei, Heeren and Maurage [PLoS One 9 (2014) e92612] stated that scientific over-activity is a better predictor of Nobel chances than diet or GNP. On the other hand, Andràs Folyovich and colleagues protested [Orv. Hetil 160 (2019) 26] that Hungary had not been included in Messerli’s study, and that their data clearly demonstrated that Hungary does not fit the scheme. To be fair, Messerli himself did point out special ‘outliers’ cases, like Sweden. Given its consumption of 6.4 kg per capita per year, Sweden should have produced about 14 Nobels, but got instead 32, about two standard deviations off the predicted value. Jokingly, he considered that the Nobel Committee, based in Stockholm, could have some inherent patriotic bias or, perhaps, that the Swedes are particularly sensitive to chocolate, and even minuscule amounts greatly enhance their cognitive skills. Pierre Maurage and colleagues [J. Nutrition 143 (2013) 931] had already warned about the perils of over-interpreting correlations in health studies, by comparing with consumption of other flavanol-rich foods. But Aloys Prinz [Soc. Sci. Human. 2 (2020) 82] recently found that the correlation remains even after factoring out important contributions, such GNP, number of publications per year, or total R&D expenditure by country. Besides being an intellectual divertissement, Messerli’s paper forces us to ask whether science, as a global social activity, shouldn’t sometimes be too hasty in getting to unsupported conclusions.

I already told you some of the adventures of the Good Ol’ Days, when working around nuclear reactors in Cadarache. Today’s story about probability and statistics, however, reminds me of something else yet. At some time around A.D. 1986 (I reckon some of you dear readers were probably not born yet) we were running an experimental campaign at the Masurca zero-power reactor. As it goes, somebody noticed that every now and then, another pigeon was found dead, always more or less at the same corner outside the reactor building. After three or four dead pigeons (I never knew the exact number) the case became an issue. Could the pigeons be killed by a radioactivity leak? Everybody said the chance was practically zero, but around a nuclear facility you never know. So, the health-hazard team started taking measurements, to make sure that we could not to be charged for illegal pigeon hunting. I still use this little story as an exercise in probability for my students, since it is a good example of low-signal detection in a noisy environment.

Imagine that you make a measurement with a Geiger counter and find a very low signal, not much above what you think should be the natural background, but somewhat not zero as well. Firstly, the background problem is solved right away. Since the (eventual, hidden) point source intensity would fall off as the inverse square of the distance, you make two measurements, at once and twice a given distance D, which give C1 and C2 counts, respectively. The background should be the same everywhere, hence the actual counts from the source are easily deduced from C1=S+B and C2=S/4+B. Now, after background subtraction, the filtered counts, C, are found to be barely above the background, something like C = X ± X. Looks like a zero to me… So what do you think, is the source there, or not? As a radiation protection expert (that’s me talking to students) you must give an upper bound to the risk factor, and statistical analysis of confidence intervals comes to help if you want to determine the upper bound of a practically undetectable signal.

You know that your detector has an overall counting efficiency of E, for the sake of example let’s put E=0,001, and let’s ask what is the maximum activity of the supposed radioactive source with, e.g. 95% confidence. In practical terms, you suppose that the source could have aimed N particles at the detector within the measurement time T, for example T=1 minute, but your detector counts zero particles with 95% confidence. Or else, if you repeat 100 measurements you would find the same result 0 at least 95 times. Then, let us presume (at the opposite) that you count a value different from 0 with the same confidence interval; that is, you should (at the opposite) count 0 in the remaining 5% of measurements. Since the response of the count is either ‘yes’ or ‘no’ (one particle, or no particle), this amounts to calculating the binomial coefficient of (N 0), which as you can easily verify is equal to N(1–E). If you want this to be equal to 0,05, then N=ln 0,05/ln(1–E)=3000, for the above value of E. This means that if at least 3000 particles reached the detector in a minute, you should see a non-zero count in 95% of measurements, and detect zero counts in the 5%. But given that in real experiments you always detected zero counts, the unknown source activity must be below 3000 counts per minute with 95% confidence interval. And that’s the best you can say about it.

Another subject in data analysis that I see being less and less taught to bachelors, is Bayesian statistics. The expression of Bayes’ theorem is deceptively simple, but the results of its simple formula are quite far reaching. It writes the probability of a conditional event A given a (likely related) event B, as the ratio P(A:B)=P(B:A) x P(A) / P(B). A classical example is when you have two categories with partly shared characters, and you are looking for the probability of observing this character by random sampling. Imagine a school with 60% boys and 40% girls, and assume that girls can wear pants or skirts, while boys wear only pants. If you now pick at random a student wearing pants, what is the probability this being a girl? A “stupid” calculation would say: you have 40% of girls, half of them may wear pants, therefore the probability is 20%. Instead, old Thomas Bayes says: let us write P(girl:pants)= P(pants:girls) x P(girls) / P(pants). Now, P(pants:girls) in the right-hand side of the equation is in fact 50%, or the probability of a girl wearing pants vs skirts; then, P(girls) is the probability of picking a girl among the ensemble of students, or 40%. Finally, P(pants) is only slightly more complicated: it is the total fraction of individuals wearing pants, that is 100% of the boys, equal to 0.6, plus 50% of the girls, equal to 0.2. Then, the value of P(girl:pants)=0.5 x 0.4 / 0.8 = 0.25, that is 25%. Clearly different from the “stupid” 20%.

In the case of Covid-19 disease, there have been endless discussions about the “pandemic of the non-vaccinated”, meaning that after the large nation-wide runs of vaccinations, the only people being actually at risk should have been the non-vaccinated. A thesis contrasted by “denialists”, by saying that the number of people seen in intensive care was very nearly the same among vaccinated and non-vaccinated. A possible application of Bayes’ theorem is to deduce the probability of ending up in intensive care (IC) when getting normal doses of vaccine (V), as P(IC:V)= P(V:IC) x P(IC) / P(V). The formula says that the probability to enter IC for a vaccinated individual V is equal to P(V:IC), the current fraction of V that we actually observe in IC, divided by the fraction of vaccinated people, P(V), and multiplied by the absolute probability of ending up in IC for any reason, that is, P(IC). The same calculation must be done also for the non-vaccinated (NV). The last term P(IC) may be rather complicate to obtain: we should measure the probability of getting in intensive care for any possible reason other than Covid-19, on the basis of age, health condition, job, lifestyle etc. In general, we could ignore this factor, since it should be the same value for both V and NV people. However, during the peak of SARS2 spreading, in heated discussions and TV talk shows many “denialists” affirmed that once you get vaccinated you put yourself at risk to get any other kind of unwanted illness, so you would be voluntarily increasing your P(IC) compared to a non-vaccinated… let us set this factor aside just for a moment.

The ratio of the fraction of V people already in IC, that is P(V:IC), divided by P(V), is a coefficient (NOT a probability) that tells how much the risk of ending up in IC is increased for a V individual. We can of course calculate the same quantity for NV individuals, that is P(NV:IC) / P(NV). Of course, by comparing the two we immediately see what Bayes’ statistic is telling us: even if the values of P(V:IC) and P(NV:IC) were to be similar (which however was not entirely true), the denominator of the two equations is largely different. P(NV) is so much smaller than P(V) to make the two probabilities practically incomparable, the probability of requiring intensive care being in fact enormously larger for NV than for V people.

On the other hand, and more interestingly, this “Bayesian” calculation allows to clearly point out inconsistencies. If, just for the sake of argument, should it turn out that the probability P(IC:V) is indeed larger than P(IC:NV), given the small denominator and assuming that the share of people in intensive care is 50/50 between V and NV, it should mean that for some reason the factor P(IC) (which we temporarily set aside in the above discussion) has got largely increased after vaccination. In other words, in this case vaccination should expose you to increased risk of many other troubles, apart from Covid. And if that were true, by now, I think we should have noticed it…

Correlation does not equal causation. Or, does it?

the physics of sundays

Correlation does not equal causation. Or, does it?

Leave a Reply Cancel reply

Post navigation

Leave a Reply Cancel reply