10.5061/DRYAD.2CS4G
Schilling, Martin P.
Utah State University
Wolf, Paul G.
Utah State University
Duffy, Aaron M.
Utah State University
Rai, Hardeep S.
Utah State University
Rowe, Carol A.
Utah State University
Richardson, Bryce A.
United States Department of Agriculture
Mock, Karen E.
Utah State University
Data from: Genotyping-by-sequencing for Populus population genomics: an
assessment of genome sampling patterns and filtering approaches
Dryad
dataset
2015
genotyping-by-sequencing
data filtering
sampling bias
Populus trichocarpa
ApeKI
2015-01-28T00:00:00Z
2015-01-28T00:00:00Z
en
https://doi.org/10.1371/journal.pone.0095292
6553318 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Continuing advances in nucleotide sequencing technology are inspiring a
suite of genomic approaches in studies of natural populations. Researchers
are faced with data management and analytical scales that are increasing
by orders of magnitude. With such dramatic advances comes a need to
understand biases and error rates, which can be propagated and magnified
in large-scale data acquisition and processing. Here we assess genomic
sampling biases and the effects of various population-level data filtering
strategies in a genotyping-by-sequencing (GBS) protocol. We focus on data
from two species of Populus, because this genus has a relatively small
genome and is emerging as a target for population genomic studies. We
estimate the proportions and patterns of genomic sampling by examining the
Populus trichocarpa genome (Nisqually-1), and demonstrate a pronounced
bias towards coding regions when using the methylation-sensitive ApeKI
restriction enzyme in this species. Using population-level data from a
closely related species (P. tremuloides), we also investigate various
approaches for filtering GBS data to retain high-depth, informative SNPs
that can be used for population genetic analyses. We find a data filter
that includes the designation of ambiguous alleles resulted in metrics of
population structure and Hardy-Weinberg equilibrium that were most
consistent with previous studies of the same populations based on other
genetic markers. Analyses of the filtered data (27,910 SNPs) also resulted
in patterns of heterozygosity and population structure similar to a
previous study using microsatellites. Our application demonstrates that
technically and analytically simple approaches can readily be developed
for population genomics of natural populations.
HapMap.hmc.txt.tarHapMap hmc files generated by UNEAK (Universal Network
Enabled Analysis Kit) filter with default
settings.HapMap.hmp.txt.tarHapMap hmp file generated by UNEAK (Universal
Network Enabled Analysis Kit) filter with default settings
North America