10.5061/DRYAD.J11FH
Long, Colby
The Ohio State University
Kubatko, Laura
The Ohio State University
Data from: The effect of gene flow on coalescent-based species-tree inference
Dryad
dataset
2018
Continuous-time Markov Chains
SVDQuartets
National Science Foundation
http://dx.doi.org/10.13039/100000001
DMS-1440386
2018-03-07T15:46:42Z
en
https://doi.org/10.1093/sysbio/syy020
33744 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Most current methods for inferring species-level phylogenies under the
coalescent model assume that no gene flow occurs following speciation.
Several studies have examined the impact of gene flow (e.g., Eckert and
Carstens (2008); Chung and Ane (2011); Leache et al. (2014); Solis-Lemus
et al. (2016)) and of ancestral population structure (DeGeorgio and
Rosenberg, 2016) on the performance of species-level phylogenetic
inference, and analytic results have been proven for network models of
gene flow (e.g., Solis-Lemus et al. (2016); Zhu et al. (2016)). However,
there are few analytic results for a continuous model of gene flow
following speciation, despite the development of mathematical tools that
could facilitate such study (e.g., Hobolth et al. (2011); Andersen et al.
(2014); Tian and Kubatko (2016)). In this paper, we consider a three-taxon
isolation-with-migration model that allows gene flow between sister taxa
for a brief period following speciation, as well as variation in the
effective population sizes across the species tree. We derive the
probabilities of each of the three gene tree topologies under this model,
and show that for certain choices of the gene flow and effective
population size parameters, anomalous gene trees (i.e., gene trees that
are discordant with the species tree but that have higher probability than
the gene tree concor- dant with the species tree) exist. We characterize
the region of parameter space producing anomalous trees, and show that the
probability of the gene tree that is concordant with the species tree can
be arbitrarily small. We then show that there is theoretical support for
using SVDQuartets with an outgroup to infer the rooted three-taxon species
tree in a model of gene flow between sister taxa. We study the performance
of SVDQuartets on simulated data and compare it to three other
commonly-used methods for species tree inference, AS- TRAL, MP-EST, and
concatenation. The simulations show that ASTRAL, MP-EST, and concatenation
can be statistically inconsistent when gene flow is present, while
SVDQuartets performs well, though large sample sizes may be required for
certain parameter choices.
GeneFlowSupplement_MapleThis maple worksheet contains computations and
examples to suppement the paper "The Effect of Gene Flow on
Coalescent-based Species-tree inference." In particular, the formula
for the concordant triple frequency is derived, the formula is shown to
agree with the numerical computations from COALGF via an example, and the
limits referenced in the section "Analytic Results for the Isolation
with Migration Model" are computed explicitly.