10.5061/DRYAD.4J1QT
Ashkenazy, Haim
Tel Aviv University
Sela, Itamar
National Institutes of Health
Levy Karin, Eli
Tel Aviv University
Landan, Giddy
Kiel University
Pupko, Tal
Tel Aviv University
Data from: Multiple sequence alignment averaging improves phylogeny
reconstruction
Dryad
dataset
2018
Alignment reliability
multiple sequence alignment
Tree reconstruction
2018-05-10T15:33:36Z
2018-05-10T15:33:36Z
en
https://doi.org/10.1093/sysbio/syy036
4840991 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
The classic methodology of inferring a phylogenetic tree from sequence
data is composed of two steps. First, a multiple sequence alignment (MSA)
is computed. Then, a tree is reconstructed assuming the MSA is correct.
Yet, inferred MSAs were shown to be inaccurate and alignment errors reduce
tree inference accuracy. It was previously proposed that filtering
unreliable alignment regions can increase the accuracy of tree inference.
However, it was also demonstrated that the benefit of this filtering is
often obscured by the resulting loss of phylogenetic signal. In this work
we explore an approach, in which instead of relying on a single MSA, we
generate a large set of alternative MSAs and concatenate them into a
single SuperMSA. By doing so, we account for phylogenetic signals
contained in columns that are not present in the single MSA computed by
alignment algorithms. Using simulations, we demonstrate that this approach
results, on average, in more accurate trees compared to (1) using an
unfiltered MSA; (2) using a single MSA with weights assigned to columns
according to their reliability. Next, we explore in which regions of the
MSA space our approach is expected to be beneficial. Finally, we provide a
simple criterion for deciding whether or not the extra effort of computing
a SuperMSA and inferring a tree from it is beneficial. Based on these
assessments, we expect our methodology to be useful for many cases in
which diverged sequences are analyzed. The option to generate such a
SuperMSA is available at http://guidance.tau.ac.il
ENSEMBLsim_datasetThe ENSEMBLsim simulated dataset examined in this
study.SuppMaterial_SuperMSASupplementary figures, tables, and text