Skip to main content

How Do We Diversify Human Genomics Research?

Publication Date:
Updated:

Collection Editor(s):

Collection Editor(s)
Name & Degree
Alice B. Popejoy, PhD
Work Title/Institution
Assistant Professor, Division of Epidemiology, Department of Public Health Sciences, University of California, Davis
  • Introduction

    The lack of diversity in human genomics research and reference datasets is a known problem, with real-life consequences, but the path forward remains elusive. Solutions to the problem of sampling bias in human genetics research must begin by understanding the following: 1) historical influences and current problems with classification, study design, and conceptual frameworks in human genetics research; 2) tensions between characterizing who has been included in (and excluded from) research, and biases that are introduced through categorizing humans by social and cultural population descriptors (e.g., race, ethnicity, ancestry); and 3) how a lack of diversity and conceptual clarity in human genomics leads to disparities in clinical genetics and precision medicine. Finally, we must recognize how a lack of diversity in the genomics workforce perpetuates these blind spots and how active recruitment and promotion of people from all backgrounds improves both our science and workplace environments.

    First, it is important to recognize how historical narratives, outdated methods, and lack of clear, harmonized language to describe human populations combine to limit progress in…

    The lack of diversity in human genomics research and reference datasets is a known problem, with real-life consequences, but the path forward remains elusive. Solutions to the problem of sampling bias in human genetics research must begin by understanding the following: 1) historical influences and current problems with classification, study design, and conceptual frameworks in human genetics research; 2) tensions between characterizing who has been included in (and excluded from) research, and biases that are introduced through categorizing humans by social and cultural population descriptors (e.g., race, ethnicity, ancestry); and 3) how a lack of diversity and conceptual clarity in human genomics leads to disparities in clinical genetics and precision medicine. Finally, we must recognize how a lack of diversity in the genomics workforce perpetuates these blind spots and how active recruitment and promotion of people from all backgrounds improves both our science and workplace environments.

    First, it is important to recognize how historical narratives, outdated methods, and lack of clear, harmonized language to describe human populations combine to limit progress in human genetics. Historical notions of biological differences between racial groups cause harm, as exemplified by the use of concepts from the American Eugenics Movement by Nazi Germany in the perpetration of the Holocaust. Recent Presidents of the American Society of Human Genetics (ASHG) have begun to address this history in their addresses at annual meetings (i.e., Drs. Gail Jarvik and Charles Rotimi) and the society published a statement denouncing attempts to use genetics as a justification for racial and ethnic injustice. However, as illustrated in Roberts’ Fatal Invention, societal context drives biomedical researchers to re-use and re-invent outdated, unscientific, and harmful notions—e.g., constructing ‘ancestry’ groups to obfuscate the field’s long-standing reliance on race categories to conduct basic research (including studies designed to explain purely social phenomena).

    An interdisciplinary group of human population geneticists and bioethicists described (over a decade ago) how methods and concepts employed in modern human genetics research are rooted in the misguided belief that race categories are innate or fundamental to human biology. These methods and concepts have been foundational to the field in both explicit and implicit ways. We must be proactive in understanding and mitigating this problematic foundation model as many investigators across disciplines begin to adopt genetic ancestry estimates as an alternative to social categories. The way we construct relative measures of genetic ancestry is problematic; we often rely on social categories and faulty assumptions about the relationships between identity groups, ancestral origins, and genomic variation. 

    Principal Component Analysis (PCA), as one example of a method reliant upon the incorrect use of race as a biological category, was used in a publication to show how genotype information could be used to re-create a map of Europe—which led people to believe that PCA plots were end-products that could be interpreted as direct evidence of population structure. The lead author of the paper later published extensively on the limitations of such approaches. These publications stressed that there are very small overall differences in allele frequencies across the geography of Europe, and this pattern of allele frequencies was not universal across all regions of the world. Nevertheless, PCA and other statistical techniques designed to collapse the complexity of multi-dimensional genetic data into a 2D figure are routinely (incorrectly) interpreted in human genetics research as indicative of global population structure.

    Despite efforts by the American Medical Association (AMA) to standardize how race and ethnicity are reported in biomedical research, vast inconsistencies persist. The concept of ‘admixture’ remains a problem because it assumes the existence of ‘pure’ or unadmixed categories, which is a result of the race-realist origins of the term. A retrospective, longitudinal review of language used to describe human populations published in The American Journal of Human Genetics demonstrated a decrease in the use of offensive and scientifically meaningless terms such as ‘Caucasian’ and ‘Negro’; however, notions of continental ancestry such as ‘African’, ‘Asian’, and ‘European’ have increased. Members of the Editorial Board of Genetics in Medicine have offered a path forward for more precise and intentional uses of language to describe human populations in publishing, including recommendations across three axes of identity: sex/gender, race/ethnicity and disability.

    A major barrier to reconciling inconsistencies and mass confusion on these topics is the ready conflation of social categories used to measure and motivate greater diversity, equity, and inclusion (DEI) in research – with the semantic labels often used to describe ‘populations’ in human genetics research, which may simply be a re-imagination of the social categories with an added cache of sounding scientific (e.g., ‘African ancestry’ vs. ‘Black or African American’), despite the absence of any meaningful distinction. The fact is: most human genomic variation is shared among groups or ‘populations’, however they are defined – except for those impacted by historical founder events that reduced genetic diversity within an ancestral population (e.g., North/West Europeans). 

    We (humans) all have a common origin, somewhere on the continent of Africa. The genomic variation in our species mostly comes from that original population; such that there is more total genetic variation between two individuals from the continent of Africa than either one of those individuals compared to a third person from another continent. Some variants have been lost in ancestral founder populations (e.g., via bottlenecks), so variants considered rare or absent in, say, European ancestries, may in fact be common elsewhere. Allele frequency differences are neither large, nor do they delineate humans into groups, so analytic methods should not rely on social or cultural categories as a proxy for stratifying samples, and ancestry groupings not grounded in historical migration patterns of humans or basic population genetic theory should be avoided.

    Most of the world’s genomic variation is missing from the knowledge base, due to oversampling and study of white Europeans, and this is a big problem. It is important to demonstrate how increasing diversity and inclusion in genomics research could improve equity in genomic medicine and enhance precision medicine overall. Conducting genetic analyses with diverse populations enhances our ability to discover various contributors to complex diseases with heterogeneity in genetics and our environments. Failure to account for diversity in genetic components of disease etiology harms those whose genomic backgrounds are excluded from basic research by increasing the likelihood that they will receive an uninformative or incorrect medical genetic test result. Medical guidelines that rely on race and ethnicity categories also create disparities by limiting access to care, so continuing the practice of race-based medicine is not a valid solution. Expanded carrier screening has been suggested as a way forward to reduce the harms of ethnicity-based reimbursement strategies faced by genetic testing labs.

    Current understanding among clinical genetics professionals, as well as existing standards and guidelines are insufficient to deal with these issues in a nuanced way. Greater attention should be paid to social determinants of health in research and medical training, so aspiring geneticists gain an intuition about the importance of causal pathway analyses and public health conceptual frameworks leading to strong study design. Treating human subjects as experts in their own lives, asking people about their experiences and identities using open-ended and open-minded approaches will likely open doors to more precise and accurate interpretations of observed disparities. We must do everything possible to prevent the pitfalls of inferring causality from statistical measures that reduce complicated systems to two dimensions, and root out the widespread influence of confounding by social/political systems in association tests. 

    A full understanding of the foundational, conceptual challenges laid out in this brief could be used to motivate change in the field, including the adoption of published recommendations for prioritizing sample selection based on genetic diversity, improving our study designs and analytic approaches, respectfully engaging underserved communities in research, including Indigenous populations. Importantly, when considering strategies to increase diversity and broaden participation in genomics research, we must actively engage people from all backgrounds to elevate and integrate perspectives informed by other disciplines, diverse lived experiences, cultures, and geographies. For example, social scientists ‘embedded’ into research teams could help genomics researchers deal with some of the more difficult and nuanced aspects of our projects. In addition to employing the FAIR principles for working with valuable data (findable, accessible, interoperable, and reusable), researchers and institutions should also consider integrating the CARE principles, which are designed to respect Indigenous data sovereignty and enhance the trustworthiness of research enterprises.

    Ultimately, these concepts and principles must be integrated into research and medical practice beyond the surface of investigating the human genome—the scientific and health systems workforce must reflect the diversity of the communities we study and serve. Active recruitment, promotion, retention, and recognition of individuals whose identities have yet to be adequately represented in medicine and science, technology, engineering, and mathematics (STEM) will improve the quality of our work as well as the social, cultural, and innovative capacity of our workforce. Equitable team dynamics are critical, as uprooting racism must become part of our collective identity and culture.

Collection Header
Historical Narratives and Current Problems with Classification, Study Design, and Conceptual Frameworks
Body
Collection Header
Reconciling the Need for Greater Diversity and Shared Genomic Variation Between Groups
Body
Collection Header
Increasing Diversity Could Improve Equity in Genomic Medicine and Enhance Precision Medicine
Body
Collection Header
Including Diverse Perspectives to Broaden Participation in Genomics Research
Body
Tags
genetic diversity
Admixture
anti-racism
FAIR data principles

Share

About ELSIhub Collections

  • ELSIhub Collections are essential reading lists on fundamental or emerging topics in ELSI, curated and explained by expert Collection Editors, often paired with ELSI trainees. This series assembles materials from cross-disciplinary literatures to enable quick access to key information.

ELSIhub Collections