10.5061/DRYAD.26J38
Allen, Julie M.
University of Illinois at Urbana Champaign
Boyd, Bret
University of Illinois at Urbana Champaign
Nguyen, Nam-Phuong
University of Illinois at Urbana Champaign
Vachaspati, Pranjal
University of Illinois at Urbana Champaign
Warnow, Tandy
University of Illinois at Urbana Champaign
Huang, Daisie I.
University of British Columbia
Grady, Patrick G. S.
University of Illinois at Urbana Champaign
Bell, Kayce C.
University of New Mexico
Cronk, Quentin C.B.
University of British Columbia
Mugisha, Lawrence
Conservation and Ecosystem Health Alliance
Pittendrigh, Barry R.
Michigan State University
Soledad Leonardi, M.
Instituto de Biología de Organismos Marinos, Centro Nacional Patagónico,
Puerto Madryn, Argentina
Reed, David L.
Florida Museum of Natural History
Johnson, Kevin P.
University of Illinois at Urbana Champaign
Data from: Phylogenomics from whole genome sequences using aTRAM
Dryad
dataset
2016
Bureelia antiqua
Osborniella crotophagae
Pedicinus badii
gene assembly
Haematopinus eurysternus
Degeeriella rufa
Pthirus gorillae
Neohaematopinus pacificus
Pedicinus badii
Genome sequencing
Linognathus spicatus
Pthirus pubis
Pediculus humanus
Pediculus schaeffi
Proechinopthirus fluctus
aTRAM
Hoplopleura arboricola
Haematopinus eurysternus
Stimulopalpus japonicus
present day
Antarctopthirus microchir
Bothriometopus macrocnemus
Genome sequencing
Holocene
2016-11-08T14:17:15Z
2016-11-08T14:17:15Z
en
https://doi.org/10.1093/sysbio/syw105
8707841 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Novel sequencing technologies are rapidly expanding the size of data sets
that can be applied to phylogenetic studies. Currently the most commonly
used phylogenomic approaches involve some form of genome reduction. While
these approaches make assembling phylogenomic data sets more economical
for organisms with large genomes, they reduce the genomic coverage and
thereby the long-term utility of the data. Currently, for organisms with
moderate to small genomes (<1000 Mbp) it is feasible to sequence
the entire genome at modest coverage (10−30×). Computational challenges
for handling these large data sets can be alleviated by assembling
targeted reads, rather than assembling the entire genome, to produce a
phylogenomic data matrix. Here we demonstrate the use of automated Target
Restricted Assembly Method (aTRAM) to assemble 1107 single-copy ortholog
genes from whole genome sequencing of sucking lice (Anoplura) and
out-groups. We developed a pipeline to extract exon sequences from the
aTRAM assemblies by annotating them with respect to the original target
protein. We aligned these protein sequences with the inferred amino acids
and then performed phylogenetic analyses on both the concatenated matrix
of genes and on each gene separately in a coalescent analysis. Finally, we
tested the limits of successful assembly in aTRAM by assembling 100 genes
from close- to distantly related taxa at high to low levels of coverage.
Concatenated alignment and treeAlignment and phylogenetic tree of the
concatenated 1,101 exon DNA alignment from 15 louse taxa. Genes were
assembled from raw genomic DNA with aTRAM and exons extracted and stitched
together. Third codon position was removed due to base composition bias,
and tree build in RAxML.Dataset_1.zipIndividual Gene Trees and
AlignmentsAll 1,101 gene trees and alignments for the 15 taxon dataset.
Each gene was aligned using PASTA and UPP for fragmentary sequences. Each
gene tree was built using ASTRAL.Dataset_2.zipSupplementaryTableDNA
extraction, and quality clean up for each dataset. Illumina reads.
Alignments of each gene and the tree analysis.Supplementary FigureBox plot
of the standard deviations away from mean for each codon position for each
of the GTR rate parameters. The majorities of the extreme outliers fell
above 10 standard deviations from the mean and were removed from the
analysis.SupplementalFigure1.pdf
World Wide