Rebecca Compton Ph.D.

Adopting Reason

DNA Tests for Ethnic Ancestry in Adoption: A Skeptic’s View

Consumers should be aware of the limitations in DNA ancestry testing.

Posted Jul 08, 2016

A new trend has emerged among adoptive parents eager to understand their children’s ancestry: commercial, direct-to-consumer genetic testing by companies such as 23andMe. Individual genetic testing can have varied purposes: for example, to identify specific biological relatives, to test for genetic risk factors for medical conditions, or to determine ethnic ancestry. It is this last purpose that this essay addresses, because it is the most questionable use of genetic testing. Can an analysis of a person’s DNA, obtained through a simple cheek swab, tell us the ethnicity of his ancestors?

My adopted son, for example, was born in the Central Asian country of Kazakhstan, and has essentially unknown ethnic ancestry according to the usual method (family history). His features look “Eurasian,” which is not surprising, given that Kazakhstan is located in the borderlands between Europe and Asia. Kazakhstan is also, like most countries, multi-ethnic. Modern-day labels for the diverse groups of people presently residing in the nation of Kazakhstan include terms like Kazakh, Uzbek, Tajik, Kyrgyz, Russian, Ukrainian, Tatar, Chechen, Korean, and German, to name a few. So can genetic testing tell us with certainty whether our son has, for example, Kyrgyz or Chechen or Tajik ancestry?

The answer, bluntly, is no. The tests may be able to estimate that he has both Asian and European ancestry, but we guessed that already; and much more than that is pure conjecture, no better than astrology, as one science-and-society institute report put it. (For additional skepticism from geneticists, see this article). While the limitations of the genetic-ancestry tests are complex, here we focus on a few key points: (1) the social construction of ethnic labels that are being mapped onto biology; (2) the limited geographical and cultural representation of the world’s people in existing genetic databases; and (3) the incomplete, fragmented record of the past in any person’s DNA.

DNA Double Helix/Andrea Laurel/CC BY 2.0
Source: DNA Double Helix/Andrea Laurel/CC BY 2.0

First, let’s begin with the ethnic labels themselves. The commercial companies use a somewhat circular method to map ethnic groups onto genes. Respondents who send in their DNA samples can indicate, via self-report, how they categorize their own ethnicity. In many cases, the respondents can choose from a set of predetermined categories and subcategories (for example, Asian, subdivided into Chinese, Vietnamese, Korean, etc.). In some cases, the respondent may be able to suggest a new label that is not already part of the database. But in either case these labels come not from the DNA, but from people’s beliefs about what categories they think they belong to. Those beliefs are then mapped onto biology and used to make inferences about how to label other people who have similar biological markers.

Imagine that we chose to establish a new category called “Pennsylvanian.” Everyone who sends in their DNA could check a box to indicate whether they are “Pennsylvanian.” Then, the company could construct a profile of the DNA sequences most strongly statistically associated with the people who call themselves “Pennsylvanian,” and the next round of participants could be told, in their ethnicity report, the extent to which their DNA matches “Pennsylvanian.” On the surface, it should seem dubious to propose that “Pennsylvanian” people could have a biologically shared ancestry (beyond that which all humans share), because people currently living in Pennsylvania had ancestors who came from many places in the world. Of course, if most Pennsylvanians are white, then the template of “Pennsylvanian” will skew towards people whose ancestors came from Europe. But even if a probabilistic template could be constructed to represent the DNA of the typical Pennsylvanian, there is no DNA segment that could categorically sort people into those who are Pennsylvanian and those who are not.

But what makes “Pennsylvanian” seem to be a dubious ethnic label, but “Kazakh” or “Polish” valid labels? They are all categories that are loosely rooted in geography but also tied to particular time periods and political boundaries. Pennsylvania didn’t even exist a few hundred years ago, so that makes it especially nonsensical as a category that might map onto ancestral biology. Fair enough, but then what timeframe is relevant? Which historical period do we harken back to in order to determine “real” ethnic categories? Ethnicity is a moving target, a set of social labels that changes over historical time. There is no magical time in the past (during the approximately 50,000 years since humans first migrated out of Africa) when people had clear, categorical biological ethnicities.

Even if ethnicities were inherently fixed (which there is every reason to doubt), another problem with DNA ancestry testing is the bias in the populations that are being sampled by commercial genetic tests. If you are white or Chinese, you are in luck: your population has been heavily sampled in the existing databases, and you are likely to have “matches” in the database that make some sense. If you are from a more obscure ethnicity—perhaps one that is more rare, or one that is dominant in a part of the world where people are not seeking their identities through $200 DNA kits—you are less likely to have a reasonable match in the database. The “match” is only as good as the database, and the databases are skewed in their sampling of the world’s population. The company 23andMe promises to give ancestry reports comparing your DNA to “31 populations worldwide.” That sounds impressive, but how does it compare to the number of ethnicities that actually exist worldwide? It’s not easy to count ethnicities because no one agrees about what defines an ethnicity in the first place. But, as a rough proxy, linguists estimate that more than 5000 languages are spoken worldwide.

This state of affairs can lead to scenarios that seem nonsensical on the surface. For example, a person born in Kazakhstan may be told that she has a partial match with Native American ancestry. Does this mean that she is partly descended from Cherokees who somehow found their way to the steppes of Central Asia? No. It means that a certain small segment of her DNA statistically matches that of people who label themselves Native Americans today. Most likely, this match is due to the fact that around 20,000 years ago, the ancestors of the people we now call Native Americans crossed a land bridge from Siberia to Alaska. Modern-day Kazakhs and modern-day Native Americans are both descended (in part) from groups that inhabited northeast Asia millennia ago; some of them turned right and went across the land bridge, and others turned left and went further into Asia.

The fact that “Native American” can turn up as a partial match for a person born in Central Asia points to several limitations in existing commercial tests. It illustrates the mutability of ethnic labels: neither “Native American” nor “Kazakh” existed 20,000 years ago; these are modern terms. It illustrates the sampling bias in the ethnic categories that are included: because “Kazakh” is not a reportable label in 23andMe (nor are most other Central Asian ethnicities), then all the algorithm can do is give the next-best matches, which may include not only Native American, but also East Asian and European subgroups. Finally, the partial DNA “match” between an ethnic Kazakh and a Native American points to the vast difference in the timescale of “ancestry” that is embedded in genes (on the order of tens of thousands of years) versus the timescale of “ancestry” that people usually have in mind when they consider family history (perhaps a couple hundred years at most).

Finally, it’s important to realize that while DNA does contain a record of the past, it is a fragmented and incomplete record. Some commercial tests rely upon mitochondrial DNA, which are passed down only through the matriline (the ancestral line involving the mother’s mother’s mother’s mother, etc.). This is one tiny branch on a huge family tree. Likewise, for males, some tests rely upon the Y chromosome, which is only passed along the patriline (father’s father’s father etc.). To use an analogy, imagine the numerous rivers and creeks that flow into an ocean. We would not claim that one tiny creek among all of those tributaries is “the” source of the ocean. Some commercial tests recognize this and rely not upon mitochondrial or Y-chromosome DNA, but rather upon full autosomal DNA (including all 22 chromosomes beyond the X/Y sex chromosomes). Unlike mitochondrial or Y-chromosome DNA, autosomal DNA is potentially influenced by all the ancestral lines leading to the present-day person. Yet even the entire genome of a present-day person only includes the ancestral gene segments that survived the mutation and recombination process that happened in each new generation. Consider that your DNA length is finite, but the more generations you go back, the more ancestors you have. Many DNA segments from these ancestors must have been simply lost to history. These lost genes are part of your ancestral past too, but they are not evident in your genome. Thus, your genome reflects only a portion of what came before.

It’s understandable that people are fascinated with the past. It is especially understandable that people with questions about their past, such as adoptees and their families, seek any source of information that may be illuminating. And direct-to-consumer DNA tests may have some beneficial or informative effects. For example, a person who believed herself to have fully African heritage may re-examine that belief (and revisit family history) if a DNA test indicates a partial match with European ancestry. Aside from implications about an individual’s heritage, the tests may prompt fruitful conversations about the meaning of ancestry and the role of the genetic past in contributing to a sense of identity in the present. The tests could stimulate a renewed interest in the history of human populations and migration. Most optimistically, the tests could reaffirm a sense in which we are all, in the end, human. But the danger is that such tests reinforce a false biological conception of race and ethnicity and mislead us into thinking that the answers to personal identity can be found at the molecular level.