A few nights ago I was watching a funny movie, Barbershop, about a guy trying to keep open his father’s shop in Chicago while under pressure by a gangster that wants to turn the shop into a strip club. I found this old movie because of the similarities of the plot with another recent movie we watched (actually, a series), The Bear, in which a big-time chef comes back to Chicago to save the lousy fast-food of his deceased brother. Anyway, while watching the movie I was interested in Ricky, the quasi-bad guy, played by Michael Ealy. If you don’t know this actor (I didn’t) he is a black guy with shiny blue eyes. This looks like an unusual combination, and it started to make me think of the odds, and the frequency with which some genetic traits appear in the population. So, I started a little search by firstly asking ChatGPT what is the distribution of eye and hair color in the Earth’s populations. The lame answer I received (something like, while there may be statistical differences in the prevalence of certain eye and hair colors across ethnic groups, it’s important to recognize the variability and diversity within each population…) told me that the AI is programmed to deflect and put a rubber wall to any questions that may have even a remote racial or ethnic bearing. Therefore, I moved up to a manual search on Google Scholar, to find papers that could say something about this question.

I quickly found some kind of qualitative answer to my question, for example in a few articles by Peter Frost, an anthropologist at Laval University, Quebec. As it happens, most humans all over the Earth have only one hair color and one eye color, both dark brown or black. However, Europeans are a big exception: their hair is black but also brown, flaxen, golden, or red; their eyes are brown but also blue, gray, hazel, or green. This diversity reaches a maximum in an area centered on the East Baltic and covering northern and eastern Europe. According to recent statistics, Latvia is the country with the highest proportion of blue or grey eyed people (most respondents to polls said that they are very proud of their eye color and don’t want to change it… in case it was possible…). If we move outward to the south and east we see a rapid return to the human norm: hair becomes uniformly dark and eyes uniformly brown. Why this color diversity? And why only in Europe? Some believe it to be a side effect of natural selection for fairer skin, to ensure enough vitamin D at the northern latitudes. Yet, skin color is known to be weakly influenced by the different alleles for hair color or eye color. Others theories link the cause of light eye colors to intermixture with Neanderthals; however, according to the mitochondrial DNA that has been retrieved up to now, no genetic continuity is discernible between late Neanderthals and early modern Europeans. It is possible that there was some gene exchange between the two groups, but certainly not enough to account for the large number of Europeans with light hair and eyes (see for example this paper by the recent Nobel prize Svante Pääbo, a typical Swedish blond with blue eyes, and the only living human with a double ä in his name).

Still other theories propose that such a color diversity arose through random factors: genetic drift, founder effects, relaxation of natural selection, and more. But these factors could not have produced such a wide variety of hair and eye hues in the 35,000 years that modern humans have inhabited Europe. For example, the hypothesis of a relaxation of selection would need nearly a million years to accumulate the diversity of the seven different alleles just for the different hair colors. Hence, we must look for some kind of non-random process that seems to have targeted just hair and eye color. But how? And why? According to the geneticist Luca Cavalli-Sforza, the answer is sexual selection. This mode of selection intensifies when for some ecological reason males outnumber females among individuals ready to mate, or vice versa. The sex in excess of supply has to compete for a mate, and resorts to the same strategies that advertisers use to grab attention, such as the use of bright or striking colors. Rare-color advantage has been studied mainly in guppy fishes and fruit flies but also occurs in other animals. A potential mate will respond not simply to a bright color, but also to a rare one that stands out from the crowd. By enhancing reproductive success, however, such a color will also become more common and less eye-catching. Sexual attraction will then shift to less common variants, the eventual result being an equilibrium that maximizes color diversity. But then, why is hair and eye color so much more diverse in Europe than elsewhere? For some reason, sexual selection should have been much stronger among ancestral Europeans than in other human populations? Here Frost makes a long story about the climate evolution from the last Ice Age and the progressive shrinking of the tundra area, which broadly coincides with the Baltic-Scandinavian-Northern Europe area of blonde-red-haired and blue-green-grey-eyed people. The humans in this area should have been under a fierce ecological pressure, because, e.g., of more difficult hunting and higher young male death rates, and female population prevalence leading to increased sexual selection (for some Darwinian beauty reason, ancient European girls considered blue-eyed hunters more successful and/or attractive).

The geneticist Thomas Thelen, published an empirical study in 1983, still largely cited, that provided an important clue to the idea of sexual selection and mating attractivity. He prepared three series of slides featuring different women: one with 6 brunettes; another with one brunette from the first group and 5 blondes; and a third with the same brunette and 11 blondes. Male subjects then had to select the woman in each series they would most prefer to marry. For the same brunette, the preference increased significantly from the first to the third series, that is in proportion to the rarity of the “target”. In other words, mating pressure tends to favor the rarest individual in the group, independently on the dark or light color. Given the European larger variability, the rare-color preference may account for the range of human hair and eye phenotypes we see in this particular geographic area and, at the opposite, by defect, also could explain the more homogeneous range of colors in geographic areas where such variability is more limited. Interestingly, there are examples in which different phenotypes have appeared independently, such as a group with blond hair among central Australian Aborigines, or brown hair among the Yukhagir of eastern Siberia, and fair hair among some Inuit bands of the western Canadian Arctic. However, it remains to see why a similar sexual preference did not occur, for example, in the human populations of North America that were subject to similar Ice-Age ecologic constraints as their Northern European distant brothers.

While these qualitative explanations are undoubtedly very suggestive and fascinating, the quantitative side of such anthropological studies is what always keeps a physicist puzzled. What is the statistical value of such studies, carried out on limited samples of population either because of limited availability or inaccessibility of data? How repeatable and with what margin of confidence are such social-based analyses? I will therefore  conclude today’s letter with a question about the value of statistical significance in scientific deduction, a subject that has been repeatedly raised in recent times (see for example this comment in Nature, which was signed by 800 scientists in different areas). Social, economic, anthropological studies, psychometry and psychology, drug and vaccine testing, medical cohort reports, regularly use the notion of ‘statistical significance’, associated to the P-value and its classical threshold of 0.05. The most common use is to test a binary hypothesis, such as: does this happen because of that, or not? So, you make a test of this, playing with or without that, and rank the respective probabilities. The occurrence of this when that is not there is the null hypothesis, meaning that this happens randomly for whatever cause, and that is not at all implicated. In practice, the P-value is a measure of the tail(s) of the statistical distribution of your data: if the amount of data falling in the tail is larger than a conventional threshold (typically 0.05) then the null hypothesis cannot be rejected. However, it should be obvious that a statistically non-significant result does not necessarily prove the null hypothesis. One cannot conclude that there is ‘no difference’ or ‘no association’ just because a P-value is larger than the classical threshold (or, equivalently, because a confidence interval includes zero). In particular, two studies do not conflict just because one had a statistically significant result and the other did not. Think for example of an experiment in which some observable A is measured while changing the parameter X, e.g. from zero to 1. One team finds a value of A=3±3.2 when X=1, and A=0 when X=0, and concludes that the parameter X has no effect on the observable; their P-value was 0.095. Another team, using a more refined apparatus, finds a value of A=3±1.5 when X=1, and concludes that X has instead a strong influence on A; their P-value is 0.001 and is considered statistically significant. Big scandal and conflict, both teams want to be right. However, both teams measured exactly the same value 3 for A, when X=1. The correct interpretation should have been that the first experiment was not enough accurate to draw any conclusions; and not that it allowed to draw a null conclusion, based on the notion of statistical non-significance. This could look totally obvious, but several analyses of scientific literature conducted at different times (see for example hereherehere, and here) showed that a large number (in some cases about half) of the articles published in various journals contain wrong interpretations or misunderstanding of this type, when using the P-value as an absolute guide. Hence, the claim in the Nature comment cited above, asking to step back from the absolute confidence in confidence intervals. As it can be easily imagined, this comment opened the classic can-of-worms, and spurred many reactions on both sides of the stadium.

There are domains of science in which measurements are intrinsically not strictly repeatable (how can you ensure that two different samples of young males looking at brunettes vs blondes will be attracted by the same features…?). Maybe one should replace the definition of “statistically significant” with something like “consequential results”, to stress the search for causal relation (or lack thereof) between events.

European weirdness

Post navigation


Leave a Reply

Your email address will not be published. Required fields are marked *