Genome-wide Association Studies (GWAS) compare the complete DNA makeup of large populations to identify genetic variants associated with observable traits, known as phenotypes. The first GWAS was published in 2005, and as of 2020, more than 4,500 GWAS had been published for nearly 5,000 unique traits or diseases. This scale has greatly improved our ability to identify disease-causing variants, but only for select populations.

As of 2022, 78% of individuals profiled in the GWAS Catalog (a free database which compiles data from existing GWAS) were of European ancestry. Only 2% were of African ancestry. From a social standpoint, the inequity is clear. But from a scientific standpoint, why is this important? 

Human genomes are 99.9% identical, and yet that remaining 0.1% of dissimilarity contains hundreds of thousands of genetic variants, resulting in a vast amount of human diversity. Genetic diversity can serve many important purposes. High genetic diversity allows for increased flexibility to stimuli, disease resistance, and environmental adaption. People of African descent hold by far the highest degree of genetic diversity in their genomes. In fact, there is more genetic diversity between neighboring African populations than between African and Eurasian populations. Why is that?

Modern humans originated on the African continent as many as 100,000 years ago, and only began migrating to other continents about 50,000 years ago. This migration out of the African continent resulted in the first of many large-scale genetic bottlenecks we know of today. A genetic bottleneck results in a loss of genetic diversity in new populations compared to the originating population. Species naturally produce increased genetic diversity over time via several mechanisms; the longer a group exists, the longer it has to generate increased diversity.

An image depicting simplified genetic bottlenecks. Two large circles overlap. The top circle is filled with red, yellow, orange, blue, green, and purple smaller circles. Text next to this circle reads: "Parent population with significant genetic variation.". The bottom circle contains only small yellow and green circles, with yellow and green circles flowing in from the top larger circle. Text next to this circle reads: "Post-bottleneck population where all but the green and yellow alleles have been lost.". A large blue arrow that reads: "Bottlenecking event" points to the area where the large circles overlap.
Schematic depicting genetic bottlenecks. Colored dots represent genetic diversity. “Bottleneck Effect” by Tsaneda is licensed under CC BY 3.0.

Let’s say you have 10 genetically distinct groups in Africa, and members from 4 of these groups become part of a migration to present-day Saudi Arabia. The 10 original groups in Africa continue to live there, inter-breed, and increase in genetic diversity. But the members who have migrated have now reduced the diversity of their population. They lost the diversity of the 6 other founder groups, and can only inter-breed with members of the 4 migrated groups, resulting in decreased levels of diversity. 

There have been several genetic bottleneck events across human history. Notably, the bottleneck events occurring when humans migrated to the North American continent, and later to the South American continent, occurred latest (only about 15,000-10,000 years ago), resulting in the lowest levels of genetic diversity in Indigenous American Populations. So people of African descent have increased genetic diversity – what does this mean for modern GWAS?

Large-scale profiling of African-descent genomes would more accurately measure the extent of genetic diversity that we see in the world today. Profiling only European genomes leaves a world of genetic variation completely unexplored, resulting in an inability to identify variants found in non-European populations, and making much of the information produced by current GWAS inapplicable to populations of any other ancestry group.

There are scientists who will argue that these studies include only European individuals because including multiple populations introduces statistical difficulties. Technically, this is true. But if that were the only concern, and if we were truly concerned with profiling the broad spectrum of human genetic diversity, 78% of the individuals profiled in the GWAS catalog would be of African descent, not European.


Peer Editor: Anna Goddard

One Reply to “The Need for Diversity in Large-Scale Genetic Studies”

Leave a Reply

Your email address will not be published.