10.5061/DRYAD.558
Mitchell, Andrew
University of Maryland, College Park
Mitter, Charles
University of Maryland, College Park
Regier, Jerome C.
Center for Agricultural Biotechnology, University of Maryland
Biotechnology InstituteCollege Park, Maryland 20742, USA
Data from: More taxa or more characters revisited: combining data from
nuclear protein-encoding genes for phylogenetic analyses of Noctuoidea
(Insecta: Lepidoptera)
Dryad
dataset
2009
combining data
Noctuidae
Noctuoidea
taxon sampling
independent genes
2009-06-15T21:22:04Z
2009-06-15T21:22:04Z
en
https://doi.org/10.1093/sysbio/49.2.202
160036 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
A central question concerning data collection strategy for molecular
phylogenies has been, is it better to increase the number of characters or
the number of taxa sampled to improve the robustness of a phylogeny
estimate? A recent simulation study concluded that increasing the number
of taxa sampled is preferable to increasing the number of nucleotide
characters, if taxa are chosen specifically to break up long branches. We
explore this hypothesis by using empirical data from noctuoid moths, one
of the largest superfamilies of insects. Separate studies of two nuclear
genes, elongation factor-1α (EF-1α) and dopa decarboxylase (DDC), have
yielded similar gene trees and high concordance with morphological
groupings for 49 exemplar species. However, support levels were quite low
for nodes deeper than the subfamily level. We tested the effects on
phylogenetic signal of (1) increasing the taxon sampling by nearly 60%, to
77 species, and (2) combining data from the two genes in a single
analysis. Surprisingly, the increased taxon sampling, although designed to
break up long branches, generated greater disagreement between the two
gene data sets and decreased support levels for deeper nodes. We appear to
have inadvertently introduced new long branches, and breaking these up may
require a yet larger taxon sample. Sampling additional characters
(combining data) greatly increased the phylogenetic signal. To contrast
the potential effect of combining data from independent genes with
collection of the same total number of characters from a single gene, we
simulated the latter by bootstrap augmentation of the single-gene data
sets. Support levels for combined data were at least as high as those for
the bootstrap-augmented data set for DDC and were much higher than those
for the augmented EF-1α data set. This supports the view that in obtaining
additional sequence data to solve a refractory systematic problem, it is
prudent to take them from an independent gene.
Mitchell et al Data Setmitchell.nexus