10.5061/DRYAD.R2N70
Irisarri, Iker
University of Konstanz
Baurain, Denis
University of Liège
Brinkmann, Henner
Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures
Delsuc, Frédéric
French National Centre for Scientific Research
Sire, Jean-Yves
Sorbonne University
Kupfer, Alexander
Staatliches Museum für Naturkunde Stuttgart
Petersen, Jörn
Leibniz Institute DSMZ – German Collection of Microorganisms and Cell Cultures
Jarek, Michael
Helmholtz Centre for Infection Research
Meyer, Axel
University of Konstanz
Vences, Miguel
Technische Universität Braunschweig
Philippe, Hervé
French National Centre for Scientific Research
Data from: Phylotranscriptomic consolidation of the jawed vertebrate timetree
Dryad
dataset
2018
2018-06-20T00:00:00Z
2018-06-20T00:00:00Z
en
https://doi.org/10.1038/s41559-017-0240-5
1438606775 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Phylogenomics is extremely powerful but introduces new challenges as no
agreement exists on “standards” for data selection, curation and tree
inference. We use jawed vertebrates (Gnathostomata) as model to address
these issues. Despite considerable efforts in resolving their evolutionary
history and macroevolution, few studies have included a full phylogenetic
diversity of gnathostomes and some relationships remain controversial. We
tested a novel bioinformatic pipeline to assemble large and accurate
phylogenomic datasets from RNA sequencing and find this
phylotranscriptomic approach successful and highly cost-effective.
Increased sequencing effort up to ca. 10Gbp allows recovering more genes,
but shallower sequencing (1.5Gbp) is sufficient to obtain thousands of
full-length orthologous transcripts. We reconstruct a robust and strongly
supported timetree of jawed vertebrates using 7,189 nuclear genes from 100
taxa, including 23 new transcriptomes from previously unsampled key
species. Gene jackknifing supports the robustness of our tree and allows
calculating genome-wide divergence times by overcoming gene sampling bias.
Mitochondrial genomes prove insufficient to resolve the deepest
relationships due to limited signal and among-lineage rate heterogeneity.
Our analyses emphasize the importance of large curated nuclear datasets to
increase the accuracy of phylogenomics and provide a reference framework
for the evolutionary history of jawed vertebrates.
Supplementary Methods, Tables and FiguresSupplementary Methods, Tables,
and FiguresIrisarri_et_al_Supplement_combined.R2.pdfCustom
scriptsdetect-divergent-seq-ali.c detect-problems-arb-3-9-15.c
ParalogDetector_V4-sort.sh split-out-paralog.ccode-Vertebrata.tgzNuclear
datasets0DP gene set – 4593 gene alignments; 1DP gene set – 1162 gene
alignments; 2DP gene set – 1434 gene alignments; nuclear test
datasetalignments.zipML tree NoDP dataset, RAxML GTR+GML tree NoDP
dataset, RAxML GTR+G, 100 non-parametric rapid
bootstrapmisgen_50-RAXML-PROTGAMMAGTR-100xRAPIDBP.treML tree 0DP dataset,
LG+G+FML tree 0DP dataset, RAxML LG+G+F, 100 non-parametric rapid
bootstrapmisgen_50-RAXML-PROTGAMMALGF-100xRAPIDBP.tre.pdfMitochondrial
datasetsConcatenated mitochondrial proteins of jawed vertebrate (except
ND6), either containing the total set of 106 taxa or a reduced set of 95
taxa after removing fastest-evolving taxa.mtP-Gnatho-ND6_datasets.tgzBI
tree, mitochondrial dataset 95 taxa, PhyloBayes
CAT+GmtP-Gnatho_mi11F-ND6_strh04_R95_V2072-CAT_200_1k_con_root.ann.treBI
tree, mitochondrial dataset 95 taxa, PhyloBayes
CATGTR+GmtP-Gnatho_mi11F-ND6_strh04_R95_V2072-CATGTR_10_1k_con_root.ann.treBI tree, mitochondrial dataset 106 taxa, PhyloBayes CAT+GmtP-Gnatho-ND6_strh04_R106_V2086-CAT_25_1k_con_root.ann.treBI tree, mitochondrial dataset 106 taxa, PhyloBayes CATGTR+GmtP-Gnatho-ND6_strh04_R106_V2086-CATGTR_10_1k_con_root.ann.treSpecies tree of the 0DP dataset, ASTRALSpecies tree of the 0DP dataset, ASTRAL tree estimated on 4593 gene trees, branch support is measured by local posterior probabilitiesASTRAL_0DP-4593.treSpecies tree of the 1DP dataset, ASTRALSpecies tree of the 1DP dataset, ASTRAL tree estimated on 1162 gene trees, branch support is measured by local posterior probabilitiesASTRAL_1DP-1162.treSpecies tree of the 2DP dataset, ASTRALSpecies tree of the 2DP dataset, ASTRAL tree estimated on 1434 gene trees, branch support is measured by local posterior probabilitiesASTRAL_2DP-1434.treBI tree, NoDP dataset, PhyloBayes CAT+GMajority rule consensus from 100 BI analyses of 100 gene jackknife replicates (alignments with ~50,000 amino acid position each). PhyloBayes, CAT+G modeljack50000-0DP-all-CATG4.con.ann.treBI tree, 1DP dataset, PhyloBayes CAT+GMajority rule consensus from 100 BI analyses of 100 gene jackknife replicates (alignments with ~50,000 amino acid position each). PhyloBayes, CAT+G modeljack50000-1DP-all-CATG-A.con.ann.treBI tree, 2DP dataset, PhyloBayes CAT+GMajority rule consensus from 100 BI analyses of 100 gene jackknife replicates (alignments with ~50,000 amino acid position each). PhyloBayes, CAT+G modeljack50000-2DP-all-CATG-A.con.ann.treGenome-averaged timetree, PhyloBayesTimetree showing averaged dates across 100 timetrees, each estimated in PhyloBayes from 100 independent gene jackknife replicates. CATGTR+G substitution model, autocorrelated log-normal clock model, 16 cross-validated calibration points with soft bounds and birth-death tree priorCATGTR-LN-BD-SB_100jacks.chronogram_mean_compCrI.treTimetree with 30 calibrations, nuclear test dataset, PhyloBayesTimetree estimated in PhyloBayes under CATGTR+G substitution model, autocorrelated log-normal clock model, 30 calibration points with soft bounds and birth-death tree prior14K_CATGTR-LN-BD-SB_all30.ch2_sample.chronogram.treTimetree with 16 calibrations, nuclear test dataset, PhyloBayesTimetree estimated in PhyloBayes under CATGTR+G substitution model, autocorrelated log-normal clock model, 16 cross-validated calibration points with soft bounds and birth-death tree prior14K_CATGTR-LN-BD-SB_CVed16b.ch2_sample.chronogramNeoceratodus forsteri transcriptomeNeoceratodus_forsteri_transcriptome_trinity_oases.fa.gzMegophrys nasuta transcriptomeMegophrys_nasuta_transcriptome_JP25.fasta.gzDiscoglossus pictus transcriptomeDiscoglossus_pictus_transcriptome_JP15.fasta.gzAndrias davidianus transcriptomeAndrias_davidianus_transcriptome_JP19.fasta.gzCalotriton asper transcriptomeCalotriton_asper_transcriptome_JP21.fasta.gzLepidosiren paradoxa transcriptomeLepidosiren_paradoxa_transcriptome_trinity_oases.fa.gzProtopterus annectens transcriptomeProtopterus_annectens_transcriptome_trinity_oases.fa.gzGeotrypetes seraphini transcriptomeGeotrypetes_seraphini_transcriptome_JP24.fasta.gzHymenochirus curtipes transcriptomeHymenochirus_curtipes_transcriptome_JP17.fasta.gzPipa pipa transcriptomePipa_pipa_transcriptome_JP18.fasta.gzProteus anguinus transcriptomeProteus_anguinus_transcriptome_JP22.fasta.gzSiren lacertina transcriptomeSiren_lacertina_transcriptome_JP16.fasta.gzTyphlonectes natans transcriptomeTyphlonectes_natans_transcriptome_JP23.fasta.gzAcipenser baerii transcriptomeAcipenser_baerii_transcriptome.fasta.bz2Amia calva transcriptomeAmia_calva_transcriptome.fasta.bz2Lepisosteus platyrhincus transcriptomeLepisosteus_platyrhincus_transcriptome.fasta.bz2Pleurodeles waltl transcriptomePleurodeles_waltl_transcriptome.fasta.bz2Polypterus senegalus transcriptomePolypterus_senegalus_transcriptome.fasta.bz2Protopterus aethiopicus transcriptomeProtopterus_aethiopicus_transcriptome.fasta.bz2Raja clavata transcriptomeRaja_clavata_transcriptome.fasta.bz2Rhinatrema bivittatum transcriptomeRhinatrema_bivittatum_transcriptome.fasta.bz2Scyliorhinus canicula transcriptomeScyliorhinus_canicula_transcriptome.fasta.bz2Tarentola mauritanica transcriptomeTarentola_mauritanica_transcriptome.fasta.bz2Typhlonectes compressicauda transcriptomeTyphlonectes_compressicauda_transcriptome.fasta.bz2ML tree, mitochondrial dataset 106 taxa, RAxML GTR+GmtP-Gnatho-ND6_strh04_R106_2773_RaxGTR4g_ML_Long.treML tree, mitochondrial dataset 106 taxa, RAxML MTREV+GmtP-Gnatho-ND6_strh04_R106_2773_RaxMtRevF4g_ML_Long.treML tree, mitochondrial dataset 95 taxa, RAxML GTR+GmtP-Gnatho_mi11F-ND6_strh04_R95_2866_RaxGTR4g_ML_Long.treML tree, mitochondrial dataset 95 taxa, RAxML MTREV+GmtP-Gnatho_mi11F-ND6_strh04_R95_2866_RaxMtRevF4g_ML_Long.tre