10.5061/DRYAD.HT0HS
McTavish, Emily Jane
University of Kansas
Hillis, David M.
The University of Texas at Austin
Data from: How does ascertainment bias in SNP analyses affect inferences
about population history?
Dryad
dataset
2015
ascertainment bias
Gene-flow
Bos indicus
SNP chip
Bos taurus
2015-04-08T14:37:44Z
2015-04-08T14:37:44Z
en
https://doi.org/10.1186/s12864-015-1469-5
113720652 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Background: The selection of variable sites for inclusion in genomic
analyses can influence results, especially when exemplar populations are
used to determine polymorphic sites. We tested the impact of ascertainment
bias on the inference of population genetic parameters using empirical and
simulated data representing the three major continental groups of cattle:
European, African, and Indian. We simulated data under three demographic
models. Each simulated data set was subjected to three ascertainment
schemes: (I) random selection; (II) geographically biased selection; and
(III) selection biased toward loci polymorphic in multiple groups.
Empirical data comprised samples of 25 individuals representing each
continental group. These cattle were genotyped for 47,506 loci from the
bovine 50 K SNP panel. We compared the inference of population histories
for the empirical and simulated data sets across different ascertainment
conditions using FST and principal components analysis (PCA). Results:
Bias toward shared polymorphism across continental groups is apparent in
the empirical SNP data. Bias toward uneven levels of within-group
polymorphism decreases estimates of F ST between groups.
Subpopulation-biased selection of SNPs changes the weighting of principal
component axes and can affect inferences about proportions of admixture
and population histories using PCA. PCA-based inferences of population
relationships are largely congruent across types of ascertainment bias,
even when ascertainment bias is strong. Conclusions: Analyses of
ascertainment bias in genomic data have largely been conducted on human
data. As genomic analyses are being applied to non-model organisms, and
across taxa with deeper divergences, care must be taken to consider the
potential for bias in ascertainment of variation to affect inferences.
Estimates of FST, time of separation, and population divergence as
estimated by principal components analysis can be misleading if this bias
is not taken into account.
Data from: How does ascertainment bias in SNP analyses affect inferences
about population history?IPython notebooks containing the simulation and
analyses code for the manuscript (named simulations.ipynb and
analyses.ipynb respectively) and empirical and simulated data used in the
manuscript.AscBiasDryad.zip