10.5061/DRYAD.27114
Chesters, Douglas
Institute of Zoology
Data from: Construction of a species-level tree of life for the insects
and utility in taxonomic profiling
Dryad
dataset
2016
Data mining
Phyloinformatics
tree of life
data integration
2016-10-27T04:37:25Z
2016-10-27T04:37:25Z
en
https://doi.org/10.1093/sysbio/syw099
45319577 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Although comprehensive phylogenies have proven an invaluable tool in
ecology and evolution, their construction is made increasingly challenging
both by the scale and structure of publically available sequences. The
distinct partition between gene-rich (genomic) and species-rich (DNA
barcode) data is a feature of data that has been largely overlooked, yet
presents a key obstacle to scaling supermatrix analysis. I present a
phyloinformatics framework for draft construction of a species-level
phylogeny of insects (Class Insecta). Matrix-building requires separately
optimized pipelines for nuclear transcriptomic, mitochondrial genomic, and
species-rich markers, whereas tree-building requires hierarchical
inference in order to capture species-breadth while retaining deep-level
resolution. The phylogeny of insects contains 49,358 species, 13,865
genera, 760 families. Deep-level splits largely reflected previous
findings for sections of the tree that are data rich or unambiguous, such
as inter-ordinal Endopterygota and Dictyoptera, the recently evolved and
relatively homogeneous Lepidoptera, Hymenoptera, Brachycera (Diptera), and
Cucujiformia (Coleoptera). However, analysis of bias, matrix construction
and gene-tree variation suggests confidence in some relationships (such as
in Polyneoptera) is less than has been indicated by the matrix bootstrap
method. To assess the utility of the insect tree as a tool in query
profiling several tree-based taxonomic assignment methods are compared.
Using test data sets with existing taxonomic annotations, a tendency is
observed for greater accuracy of species-level assignments where using a
fixed comprehensive tree of life in contrast to methods generating smaller
de novo reference trees. Described herein is a solution to the discrepancy
in the way data are fit into supermatrices. The resulting tree facilitates
wider studies of insect diversification and application of advanced
descriptions of diversity in community studies, among other presumed
applications.
pipeline_files.tarSet of files required by the pipeline. Includes:
InsectaCoreOrthologs (insect core orthologs processed and in single file);
InsMito_sumtrees.procd (example summary tree from the mitogenome analysis,
which can be used to constrain a species level tree);
H03InsProf.muscle.fas (insect COI barcode profile based on Hebert et al.
2003); 12S_profile (set of 12S sequences); treePL_config_file
(configuration file for treePL)transcriptomes_supermatrix.nexsupermatrix
in nexus format, built from nuclear orthologs taken from 33
transcriptomes, including all insect
ordersmitogenome_supermatrix.nexSupermatrix of protein coding genes of 806
mitogenomesspecieslevel_supermatrix.nexSpecies-level insect supermatrix.
In nexus format. 8 genes and 49338 species.species_level_treeSpecies level
tree of life for insects, in Newick format. Can be input into high
capacity tree viewers and also used in further downstream
analysisspecies_level_tree.processedSpecies level tree in Newick format.
Processed: has been rerooted and made ultrametric. Can be read with high
capacity tree viewing software.ConstraintsSet of relational constraints
used in species level inference. Human readable in tab delimited text
format, with Linux newlines.relational_constraints_tabdelimDevelopment
Commands and NotesText file containing all commands used during
development of this project. For user friendly implementation of the
central protocol, see SOPHI documents
instead.ITOL1_Development_Commands_and_Notes.txtOnline_Supplemental_Document_PDFOnline Supplement. PDF format. Contains supplementary methods, results, discussion, references, figures and tables.species_level_tree_HRas per figure 4 of the paper, except high resolution with more complete labeling. png image format.