10.5061/DRYAD.TMPG4F4VH
Funk, Stephan
University of La Frontera
Vega-Pla, Jose Luis
Laboratorio de Investigación Aplicada, Crıa Caballar de las Fuerzas
Armadas, Cordoba, Spain
Luis, Cristina
University of Lisbon
Cothran, Gus
0000-0003-2791-4331
Texas A&M University
Juras, Rytis
0000-0002-7385-0618
Texas A&M University
Major inconsistencies of inferred population genetic structure estimated
in a large set of domestic horse breeds using microsatellites
Dryad
dataset
2020
STRUCTURE software
Horses
population genetic structure
2021-02-03T00:00:00Z
2020-08-26T00:00:00Z
en
https://doi.org/10.1002/ece3.6195
797865 bytes
3
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
STRUCTURE remains the most applied tool aimed at recovering the true, but
unknown, population structure from observed microsatellite data or other
genetic markers. About 30% of STRUCTURE-based studies could not be
reproduced (Gilbert et al., 2012). Here we use a large set of data from
2323 horses from 93 domestic breeds plus the Przewalski horse, typed at 15
microsatellite markers, to evaluate how program settings, in particular
the so far insufficiently evaluated number of replicates, impact the
estimation of the optimal number of population clusters Kopt that best
describe the observed data. Domestic horses are suited as a test case as
there is extensive knowledge of the history of many breeds, extensive
phylogenetic analyses. Different methods based on different genetic
assumptions and statistical procedures (DAPC, FLOCK, PCoA and STRUCTURE
with different run scenarios) all revealed the general, broad-scale
relationships among the breeds that largely reflect known breed histories
but diverged largely how they characterized small-scale patterns.
STRUCTURE failed to consistently identify Kopt using the most widespread
approach, the ΔK method, despite very large numbers of MCMCs (3,000,000)
and replicates (100). The interpretation of breed structure over
increasing numbers of K, without assuming a Kopt, was consistent with
known breed histories. The over-reliance on Kopt should be replaced by a
qualitative description of clustering over increasing K, which is
scientifically more honest and has the advantage of being much faster and
less computer intensive as lower numbers of MCMC iterations and
repetitions suffice for stable results. Very large data sets are highly
challenging for cluster analyses, especially when populations with complex
genetic histories are investigated.
Samples collected during long-term studies on horse genetics. 15 autosomal
microsatellite markers distributed on 14 chromosomes, from marker panels
that are recomended for diversity studies fy ISAG-FAO and International
Society for Animal Genetics.