Kasia Bryc | Publications

2025

Bayesian inference of population structure using identity-by-descent-based stochastic block models

Micheletti SJ
Bryc K
Esselmann, SAG
Wilton PR
23andMe Research Team
Freyman WA

2025 Nov 26
[biorXiv]

Genetic testing predicts appearance but not behavior in dogs

Lord KA
Sohrab V
Bryc K
White ME
Kenney B
Pirovich KM
Chen FL
Karlsson E

PNAS
2025 Nov 24
[paper]

2023

The genetic legacy of African Americans from Catoctin Furnace

Harney E
Micheletti S
Bruwelheide KS
Freyman W
Bryc K
Akbari A
Jewett E
Comer E
Gates HL
...
23andMe Research Team
Rohland N
Mountain JL
Owsley DW
Reich D

Science
2023 Aug 3
[science]

GWAS of cataract in Puerto Ricans identifies a novel large-effect variant in ITGA6

Jingchunzi S
O'Connell J
Hicks B
Wang W
Bryc K
Brady JJ
Vacic V
Freyman W
Abul-Husn NS
Auton A
23andMe Research Team
Shringarpure S

2023 Jul 25
[medRxiv]

2022

Response to Pfenning and Lachance

Micheletti SJ
Esselmann SA
Bryc K
Mountain JL

American Journal of Human Genetics
2022 Feb 2
[paper]

2021

Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows–Wheeler Transform

Freyman WA
McManus KF
Shringarpure SS
Jewett EM
Bryc K
23andMe Research Team
Auton A

Molecular Biology and Evolution
2021 May 5
[paper]

Abstract
Estimating the genomic location and length of identical-by-descent (IBD) segments among individuals is a crucial step in many genetic analyses. However, the exponential growth in the size of biobank and direct-to-consumer genetic data sets makes accurate IBD inference a significant computational challenge. Here we present the templated positional Burrows–Wheeler transform (TPBWT) to make fast IBD estimates robust to genotype and phasing errors. Using haplotype data simulated over pedigrees with realistic genotyping and phasing errors, we show that the TPBWT outperforms other state-of-the-art IBD inference algorithms in terms of speed and accuracy. For each phase-aware method, we explore the false positive and false negative rates of inferring IBD by segment length and characterize the types of error commonly found. Our results highlight the fragility of most phased IBD inference methods; the accuracy of IBD estimates can be highly sensitive to the quality of haplotype phasing. Additionally, we compare the performance of the TPBWT against a widely used phase-free IBD inference approach that is robust to phasing errors. We introduce both in-sample and out-of-sample TPBWT-based IBD inference algorithms and demonstrate their computational efficiency on massive-scale data sets with millions of samples. Furthermore, we describe the binary file format for TPBWT-compressed haplotypes that results in fast and efficient out-of-sample IBD computes against very large cohort panels. Finally, we demonstrate the utility of the TPBWT in a brief empirical analysis, exploring geographic patterns of haplotype sharing within Mexico. Hierarchical clustering of IBD shared across regions within Mexico reveals geographically structured haplotype sharing and a strong signal of isolation by distance. Our software implementation of the TPBWT is freely available for noncommercial use in the code repository (https://github.com/23andMe/phasedibd, last accessed January 11, 2021).

2020

Genetic Consequences of the Transatlantic Slave Trade in the Americas

Micheletti SJ
Bryc K
Esselmann SA
Freyman WA
Moreno ME
Poznik GD
Shastri AJ
23andMe Research Team
Beleza S
Mountain JL

American Journal of Human Genetics
2020 Aug 6
[paper]

Abstract
According to historical records of transatlantic slavery, traders forcibly deported an estimated 12.5 million people from ports along the Atlantic coastline of Africa between the 16th and 19th centuries, with global impacts reaching to the present day, more than a century and a half after slavery’s abolition. Such records have fueled a broad understanding of the forced migration from Africa to the Americas yet remain underexplored in concert with genetic data. Here, we analyzed genotype array data from 50,281 research participants, which—combined with historical shipping documents—illustrate that the current genetic landscape of the Americas is largely concordant with expectations derived from documentation of slave voyages. For instance, genetic connections between people in slave trading regions of Africa and disembarkation regions of the Americas generally mirror the proportion of individuals forcibly moved between those regions. While some discordances can be explained by additional records of deportations within the Americas, other discordances yield insights into variable survival rates and timing of arrival of enslaved people from specific regions of Africa. Furthermore, the greater contribution of African women to the gene pool compared to African men varies across the Americas, consistent with literature documenting regional differences in slavery practices. This investigation of the transatlantic slave trade, which is broad in scope in terms of both datasets and analyses, establishes genetic links between individuals in the Americas and populations across Atlantic Africa, yielding a more comprehensive understanding of the African roots of peoples of the Americas.

White Paper 23-05: Neanderthal Ancestry Inference

Smith RP
Kleinman A
Bryc K
Mountain J
Durand EY
McManus K
Esselmann SA

23andMe White Paper
2020 Jul 1
[whitepaper]

Abstract
This white paper serves as a companion to the Neanderthal report offered to 23andMe customers as part of the 23andMe Personal Genome Service. It offers more details on the methods used in calculating Neanderthal variant counts and in detecting human traits that are associated with Neanderthal variants. We hope that this white paper may also serve the scientific community to understand how genetic variation inherited from Neanderthal ancestors may affect modern-day phenotypes.

2017

White Paper 23-14: Ancestry Timeline

Bryc K
Durand EY
Mountain J

23andMe White Paper
2017 Mar 10
[whitepaper]

Abstract
Ancestry Timeline is a 23andMe feature that enables customers to find out, for each of the ancestries they carry, when they may have had an ancestor in their genealogy who was likely to be a non-admixed representative of that population. This document is a technical description of the statistical methodology supporting this feature.

2014

The genetic ancestry of African Americans, Latinos, and European Americans across the United States

Bryc K
Durand EY
Macpherson M
Reich D
Mountain J

American Journal of Human Genetics
2014 Dec 18
[paper]

Abstract
Over the past 500 years, North America has been the site of ongoing mixing of Native Americans, European settlers, and Africans (brought largely by the trans-Atlantic slave trade), shaping the early history of what became the United States. We studied the genetic ancestry of 5,269 self-described African Americans, 8,663 Latinos, and 148,789 European Americans who are 23andMe customers and show that the legacy of these historical interactions is visible in the genetic ancestry of present-day Americans. We document pervasive mixed ancestry and asymmetrical male and female ancestry contributions in all groups studied. We show that regional ancestry differences reflect historical events, such as early Spanish colonization, waves of immigration from many regions of Europe, and forced relocation of Native Americans within the US. This study sheds light on the fine-scale differences in ancestry within and across the United States and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry.

2013

Separation of the largest eigenvalues in eigenanalysis of genotype data from discrete subpopulations

Bryc K
Bryc W
Silverstein JW

Theoretical Population Biology
2013 Aug 20
[paper]

Abstract
We present a mathematical model, and the corresponding mathematical analysis, that justifies and quantifies the use of principal component analysis of biallelic genetic marker data for a set of individuals to detect the number of subpopulations represented in the data. We indicate that the power of the technique relies more on the number of individuals genotyped than on the number of markers.

A Novel Approach to Estimating Heterozygosity from Low-Coverage Genome Sequence

Bryc K
Patterson N
Reich D

Genetics
2013 Aug 9
[paper]

Abstract
High-throughput shotgun sequence data make it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites within an individual’s genome, which is informative about inbreeding and effective population size. However, in many cases, the available sequence data of an individual are limited to low coverage, preventing the confident calling of genotypes necessary to directly count the proportion of heterozygous sites. Here, we present a method for estimating an individual’s genome-wide rate of heterozygosity from low-coverage sequence data, without an intermediate step that calls genotypes. Our method jointly learns the shared allele distribution between the individual and a panel of other individuals, together with the sequencing error distributions and the reference bias. We show our method works well, first, by its performance on simulated sequence data and, second, on real sequence data where we obtain estimates using low-coverage data consistent with those from higher coverage. We apply our method to obtain estimates of the rate of heterozygosity for 11 humans from diverse worldwide populations and through this analysis reveal the complex dependency of local sequencing coverage on the true underlying heterozygosity, which complicates the estimation of heterozygosity from sequence data. We show how we can use filters to correct for the confounding arising from sequencing depth. We find in practice that ratios of heterozygosity are more interpretable than absolute estimates and show that we obtain excellent conformity of ratios of heterozygosity with previous estimates from higher-coverage data.

2012

A high-coverage genome sequence from an archaic Denisovan individual

Meyer M
Kircher M
Gansauge MT
Li H
Racimo F
Mallick S
Schraiber JG
Jay F
Prüfer K
de Filippo C
Sudmant PH
Alkan C
Fu Q
Do R
Rohland N
Tandon A
Siebauer M
Green RE
Bryc K
Briggs AW
Stenzel U
Dabney J
Shendure J
Kitzman J
Hammer MF
Shunkov MV
Derevianko AP
Patterson N
Andrés AM
Eichler EE
Slatkin M
Reich D
Kelso J
Pääbo S

Science
2012 Oct 12
[paper]

Abstract
We present a DNA library preparation method that has allowed us to reconstruct a high-coverage (30×) genome sequence of a Denisovan, an extinct relative of Neandertals. The quality of this genome allows a direct estimation of Denisovan heterozygosity indicating that genetic diversity in these archaic hominins was extremely low. It also allows tentative dating of the specimen on the basis of “missing evolution” in its genome, detailed measurements of Denisovan and Neandertal admixture into present-day human populations, and the generation of a near-complete catalog of genetic changes that swept to high frequency in modern humans since their divergence from Denisovans.

Population genetic inference from personal genome data: impact of ancestry and admixture on human genomic variation

Kidd JM
Gravel S
Byrnes J
Moreno-Estrada A
Musharoff S
Bryc K
Degenhardt JD
Brisbin A
Sheth V
Chen R
McLaughlin SF
Peckham HE
Omberg L
Bormann Chung CA
Stanley S
Pearlstein K
Levandowsky E
Acevedo-Acevedo S
Auton A
Keinan A
Acuña-Alonzo V
Barquera-Lozano R
Canizales-Quinteros S
Eng C
Burchard EG
Russell A
Reynolds A
Clark AG
Reese MG
Lincoln SE
Butte AJ
De La Vega FM
Bustamante CD

American Journal of Human Genetics
2012 Oct 5
[paper]

Abstract
Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today’s African Americans dates back to European gene flow happening only 7-8 generations ago.

PCAdmix: Principal Components-based assignment of ancestry along each chromosome in individuals with admixed ancestry from two or more populations

Brisbin A
Bryc K
Byrnes J
Zakharia F
Omberg L
Degenhardt J
Reynolds A
Ostrer H
Mezey JG
Bustamante CD

Human Biology
2012 Aug
[paper]

Abstract
Identifying ancestry along each chromosome in admixed individuals provides a wealth of information for understanding the population genetic history of admixture events and is valuable for admixture mapping and identifying recent targets of selection. We present PCAdmix, a Principal Components-based algorithm for determining ancestry along each chromosome from a high-density, genome-wide set of phased single-nucleotide polymorphism (SNP) genotypes of admixed individuals. We compare our method to HAPMIX on simulated data from two ancestral populations, and we find high concordance between the methods. Our method also has better accuracy than LAMP when applied to three-population admixture, a situation as yet unaddressed by HAPMIX. Finally, we apply our method to a data set of four Latino populations with European, African, and Native American ancestry. We find evidence of assortative mating in each of the four populations, and we identify regions of shared ancestry that may be recent targets of selection and could serve as candidate regions for admixture-based association mapping.

2011

On identifying the optimal number of population clusters via the deviance information criterion

Gao H
Bryc K
Bustamante CD

PLoS One
2011 Jun 28
[paper]

Abstract
Inferring population structure using bayesian clustering programs often requires a priori specification of the number of subpopulations, K, from which the sample has been drawn. Here, we explore the utility of a common bayesian model selection criterion, the Deviance Information Criterion (DIC), for estimating K. We evaluate the accuracy of DIC, as well as other popular approaches, on datasets generated by coalescent simulations under various demographic scenarios. We find that DIC outperforms competing methods in many genetic contexts, validating its application in assessing population structure.

2010

Genome sequencing and analysis of admixed genomes of African and Mexican ancestry: implications for personal ancestry reconstruction and multi-ethnic medical genomics

De La Vega FM
Bryc K
Degenhardt J
Musharoff S
Kidd JM
Seth V
Stanley S
Brisbin A
Keinan A
Clark A
Bustamante CD

Genome Biology
2010 Oct 11
[paper]

Abstract
Understanding the contribution of rare and common genetic variants to disease susceptibility will probably require multi- and trans- ethnic sequencing studies that compare the genomes of many individuals with and without a particular disease. Accounting for the role of population stratification at fine scales, both in terms of genomic and geographic location, will be important because rare alleles are likely to show more population stratification. Here, we present results from sequencing, assembly and genomic analysis of two genomes from the Phase 3 HapMap. The donor individuals are of Mexican and African ancestry and represent the first ‘admixed’ genomes to be sequenced to high coverage.

Genome-wide patterns of population structure and admixture among Hispanic/Latino populations

Bryc K
Velez C
Hammer M
Karafet T
Ostrer H
Bustamante CD

PNAS
2010 May
[paper]

Abstract
Hispanic/Latino populations possess a complex genetic structure that reflects recent admixture among and potentially ancient substructure within Native American, European, and West African source populations. Here, we quantify genome-wide patterns of SNP and haplotype variation among 100 individuals with ancestry from Ecuador, Colombia, Puerto Rico, and the Dominican Republic genotyped on the Illumina 610-Quad arrays and 112 Mexicans genotyped on Affymetrix 500K platform. Intersecting these data with previously collected high-density SNP data from 4,305 individuals, we use principal component analysis and clustering methods FRAPPE and STRUCTURE to investigate genome-wide patterns of African, European, and Native American population structure within and among Hispanic/Latino populations. Comparing autosomal, X and Y chromosome, and mtDNA variation, we find evidence of a significant sex bias in admixture proportions consistent with disproportionate contribution of European male and Native American female ancestry to present-day populations. We also find that patterns of linkage-disequilibria in admixed Hispanic/Latino populations are largely affected by the admixture dynamics of the populations, with faster decay of LD in populations of higher African ancestry. Finally, using the locus-specific ancestry inference method LAMP, we reconstruct fine-scale chromosomal patterns of admixture. We document moderate power to differentiate among potential subcontinental source populations within the Native American, European, and African segments of the admixed Hispanic/Latino genomes. Our results suggest future genome-wide association scans in Hispanic/Latino populations may require correction for local genomic ancestry at a subcontinental scale when associating differences in the genome with disease risk, progression, and drug efficacy, as well as for admixture mapping.

Genome-wide SNP and haplotype analyses reveal a rich history underlying dog domestication

VonHoldt BM
Han E
Pollinger J
Lohmueller K
Earl DA
Parker HG
Quignon P
Boyko A
Auton A
Reynolds A
Bryc K
Brisbin A,Knowles J
Mosher DS
Spady TC
Elkahloun A
Pilot M
Grecco C
Randi E
Bannasch D
Kays R
Wilton A
Shearman J
Cargill M
Jones PG
Zuwei Q
Zhou W
Zhang Y
Bustamante CD
Ostrander EA
Novembre J
Wayne RK

Nature
2010 Mar
[paper]

Abstract
Advances in genome technology have facilitated a new understanding of the historical and genetic processes crucial to rapid phenotypic evolution under domestication. To understand the process of dog diversification better, we conducted an extensive genome-wide survey of more than 48,000 single nucleotide polymorphisms in dogs and their wild progenitor, the grey wolf. Here we show that dog breeds share a higher proportion of multi-locus haplotypes unique to grey wolves from the Middle East, indicating that they are a dominant source of genetic diversity for dogs rather than wolves from east Asia, as suggested by mitochondrial DNA sequence data. Furthermore, we find a surprising correspondence between genetic and phenotypic/functional breed groupings but there are exceptions that suggest phenotypic diversification depended in part on the repeated crossing of individuals with novel phenotypes. Our results show that Middle Eastern wolves were a critical source of genome diversity, although interbreeding with local wolf populations clearly occurred elsewhere in the early history of specific lineages. More recently, the evolution of modern dog breeds seems to have been an iterative process that drew on a limited genetic toolkit to create remarkable phenotypic diversity.

Genome-wide patterns of population structure and admixture in Africans and African Americans

Bryc K
Nelson MR
Oksenberg JR
Hauser SL
Williams S
Bustamante CD
Tishkoff SA

PNAS
2010 Jan
[paper]

Abstract
Quantifying patterns of population structure in Africans and African Americans illuminates the history of human populations and is critical for undertaking medical genomic studies on a global scale. To obtain a fine-scale genome-wide perspective of ancestry, we analyze Affymetrix GeneChip 500K genotype data from African Americans (n = 365) and individuals with ancestry from West Africa (n = 203 from 12 populations) and Europe (n = 400 from 42 countries). We find that population structure within the West African sample reflects primarily language and secondarily geographical distance, echoing the Bantu expansion. Among African Americans, analysis of genomic admixture by a principal component-based approach indicates that the median proportion of European ancestry is 18.5% (25th–75th percentiles: 11.6–27.7%), with very large variation among individuals. In the African-American sample as a whole, few autosomal regions showed exceptionally high or low mean African ancestry, but the X chromosome showed elevated levels of African ancestry, consistent with a sex-biased pattern of gene flow with an excess of European male and African female ancestry. We also find that genomic profiles of individual African Americans afford personalized ancestry reconstructions differentiating ancient vs. recent European and African ancestry. Finally, patterns of genetic similarity among inferred African segments of African-American genomes and genomes of contemporary African populations included in this study suggest African ancestry is most similar to non-Bantu Niger-Kordofanian-speaking populations, consistent with historical documents of the African Diaspora and trans-Atlantic slave trade.

2009

Global distribution of genomic diversity underscores rich complex history of continental human populations

A Auton
K Bryc
AR Boyko
K Lohmueller
K Wright
J Novembre
A Renyolds
A Indap
J Degenhardt
KS King
MR Nelson
CD Bustamante

Genome Research
2009 May 1
[paper]

Abstract
Characterizing patterns of genetic variation within and among human populations is important for understanding human evolutionary history and for careful design of medical genetic studies. Here, we analyze patterns of variation across 443,434 single nucleotide polymorphisms (SNPs) genotyped in 3845 individuals from four continental regions. This unique resource allows us to illuminate patterns of diversity in previously under-studied populations at the genome-wide scale including Latin America, South Asia, and Southern Europe. Key insights afforded by our analysis include quantifying the degree of admixture in a large collection of individuals from Guadalajara, Mexico; identifying language and geography as key determinants of population structure within India; and elucidating a north–south gradient in haplotype diversity within Europe. We also present a novel method for identifying long-range tracts of homozygosity indicative of recent common ancestry. Application of our approach suggests great variation within and among populations in the extent of homozygosity, suggesting both demographic history (such as population bottlenecks) and recent ancestry events (such as consanguinity) play an important role in patterning variation in large modern human populations.

Genome-wide survey of SNP variation uncovers the genetic structure of cattle breeds

Bovine HapMap Consortium

Science
2009 Apr 24
[paper]

Abstract
The imprints of domestication and breed development on the genomes of livestock likely differ from those of companion animals. A deep draft sequence assembly of shotgun reads from a single Hereford female and comparative sequences sampled from six additional breeds were used to develop probes to interrogate 37,470 single-nucleotide polymorphisms (SNPs) in 497 cattle from 19 geographically and biologically diverse breeds. These data show that cattle have undergone a rapid recent decrease in effective population size from a very large ancestral population, possibly due to bottlenecks associated with domestication, selection, and breed formation. Domestication and artificial selection appear to have left detectable signatures of selection within the cattle genome, yet the current levels of diversity within breeds are at least as great as exists within humans.

2008

Genes mirror geography within Europe

Novembre J
Johnson T
Bryc K
Kutalik Z
Boyko AR
Auton A
Indap A
King KS
Bergmann S
Nelson MR
Stephens M
Bustamante CD

Nature
2008 Aug 31
[paper]

Abstract
Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing6; an individual’s DNA can be used to infer their geographic origin with surprising accuracy—often to within a few hundred kilometres.

The Population Reference Sample, POPRES: A Resource for Population, Disease, and Pharmacological Genetics Research.

Nelson MR
Bryc K
King KS
Indap A
Boyko AR
Novembre J
Briley LP
Maruyama Y
Waterworth DM
Waeber G
Vollenweider P
Oksenberg JR
Hauser SL
Stirnadel HA
Kooner JS
Chambers JC
Jones B
Mooser V
Bustamante CD
Roses AD
Burns DK
Ehm MG
Lai EH

American Journal of Human Genetics
2008 Aug 27
[paper]

Abstract
Technological and scientific advances, stemming in large part from the Human Genome and HapMap projects, have made large-scale, genome-wide investigations feasible and cost effective. These advances have the potential to dramatically impact drug discovery and development by identifying genetic factors that contribute to variation in disease risk as well as drug pharmacokinetics, treatment efficacy, and adverse drug reactions. In spite of the technological advancements, successful application in biomedical research would be limited without access to suitable sample collections. To facilitate exploratory genetics research, we have assembled a DNA resource from a large number of subjects participating in multiple studies throughout the world. This growing resource was initially genotyped with a commercially available genome-wide 500,000 single-nucleotide polymorphism panel. This project includes nearly 6,000 subjects of African-American, East Asian, South Asian, Mexican, and European origin. Seven informative axes of variation identified via principal-component analysis (PCA) of these data confirm the overall integrity of the data and highlight important features of the genetic structure of diverse populations. The potential value of such extensively genotyped collections is illustrated by selection of genetically matched population controls in a genome-wide analysis of abacavir-associated hypersensitivity reaction. We find that matching based on country of origin, identity-by-state distance, and multidimensional PCA do similarly well to control the type I error rate. The genotype and demographic data from this reference sample are freely available through the NCBI database of Genotypes and Phenotypes (dbGaP).

Kasia Bryc Publications

2025

2023

2022

2021

2020

2017

2014

2013

2012

2011

2010

2009

2008