10.5061/DRYAD.3H4M5
Lee, Michael
South Australian Museum
Lee, Michael S. Y.
South Australian Museum
Data from: Multiple morphological clocks and total-evidence tip-dating in
mammals
Dryad
dataset
2016
Placentalia
Mammalia
Bayesian phylogenetics
ClockstaR
relaxed clocks
tip-dating
Eutheria
Morphological Clocks
2016-05-31T17:38:39Z
2016-05-31T17:38:39Z
en
https://doi.org/10.1098/rsbl.2016.0033
51973900 bytes
2
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Morphological integration predicts that correlated characters will
coevolve; thus, each distinct suite of correlated characters might be
expected to evolve according to a separate clock or ‘pacemaker’.
Characters in a large morphological dataset for mammals were found to be
evolving according to seven separate clocks, each distinct from the
molecular clock. Total-evidence tip-dating using these multiple clocks
inflated divergence time estimates, but potentially improved topological
inference. In particular, single-clock analyses placed several
meridiungulates and condylarths in a heterodox position as stem
placentals, but multi-clock analyses retrieved a more plausible and
orthodox position within crown placentals. Several shortcomings (including
uneven character sampling) currently impact upon the accuracy of
total-evidence dating, but this study suggests that when sufficiently
large and appropriately constructed phenotypic datasets become more
commonplace, multi-clock approaches are feasible and can affect both
divergence dates and phylogenetic relationships.
Description of All Data Files on DryadDescription of All Data Files on
Dryad (docx); this is also available on on the Biology Letters website.
All numbered refs, such as [6], in the descriptions below refer to
citations in the main Biology Letters paper or in this
document.Description.docxA1_CharacterPartitionsFile_A1 (Excel). Table
describing the 28 partitions (25 morphological, 3 molecular) and listing
the included characters. Character numbering is based on the matrix from
[6], available as MorphoBank Project 773
(http://dx.doi.org/10.7934/P773).B1_TreeRef6_MrBayesFile_Morph1-25File_B1
(plain text). The MrBayes [10] executable file to infer branch lengths for
the 25 candidate morphological partitions, using the total-evidence
Maximum Likelihood topology from [6]. The matrix consists of only the 46
extant taxa from [6], since molecular branch lengths (to which these
morphological branch lengths have to be compared) cannot be ascertained
for extinct taxa.B2_TreeRef6_MrBayesFile_Codons1-3File_B2 (plain text).
The MrBayes executable file to infer branch lengths for the 3 candidate
molecular partitions (and overall morphological branch lengths), using the
total-evidence Maximum Likelihood topology from [6]. The matrix consists
of the 46 extant taxa from
[6].B3_TreeRef6_Parts1-28branchlengths_newickFile_B3 (newick format). The
ClockstaR [3] treefile containing the trees with the branch lengths for
the 28 candidate partitions obtained from the MrBayes analysis in files
B1, B2. Note: ClockstaR treats all trees as unrooted, so different
rootings of trees are of no
consequence.B4_TreeRef9_MrBayesFile_Morph1-25File_B4 (plain text). The
MrBayes [10] executable file to infer branch lengths for the 25 candidate
morphological partitions, using the topology from [9]. The matrix consists
of only the 46 extant taxa from [6], since molecular branch lengths (to
which these morphological branch lengths have to be compared) cannot be
ascertained for extinct taxa.B5_TreeRef9_MrBayesFile_Codons1-3File_B5
(plain text). The MrBayes executable file to infer branch lengths for the
3 candidate molecular partitions (and overall morphological branch
lengths), using the topology from [9]. The matrix consists of the 46
extant taxa from [6].B6_TreeRef9_Parts1-28branchlengths_newickFile_B6
(newick format). The ClockstaR treefile containing the trees with the
branch lengths for the 28 candidate partitions obtained from the MrBayes
analysis in files B4, B5. Note: ClockstaR treats all trees as unrooted, so
different rootings of trees are of no
consequence.B7_ClockstarResultsFile_B7 (pdf). ClockstaR partition matrices
and gap statistics for analyses using the guide tree from ref [6] (upper
panels) and ref [9] (lower panels). From the 28 candidate partitions, both
analyses identified an optimal number of 8 clock-partitions (pacemakers)
of similar composition. Note that the order of partitions in these tables
(molecular partitions uppermost) is different to that in Fig. 1, but the
same as in Fig. B10.B8_Randomised_TOLMorph25p_TOL_GamFile_B8 (plain text).
The MrBayes [10] executable file to infer branch lengths for the 25
randomised morphological partitions. The partitions were of the same size
as the 25 original partitions (42-553 characters); morphological
characters were randomly shuffled in Excel. The molecular data were not
randomised. The matrix otherwise is identical to that in File
B1.B9_Randomised_TreeRef6_Parts1-28branchlengths_newickFile_B9 (newick
format). The ClockstaR [3] treefile containing the trees with the branch
lengths for the 25 randomised morphological partitions, and the original
(not randomised) 3 molecular partitions obtained from the MrBayes analysis
as per file B2. Note: ClockstaR treats all trees as unrooted, so different
rootings of trees are of no
consequence.B10_Randomised_clockstar_ResultsFile_B10 (pdf). ClockstaR
partition matrices and gap statistics for analysis using randomised
morphological partitions, using the guide tree from ref [6]. From the 28
candidate partitions, the analysis preferred very many, or very few,
morphological partitions, unlike the corresponding analysis of the
original data (File B7, top) which preferred an intermediate number. Note
that the order of partitions in these tables (molecular partitions
uppermost) is different to that in Fig. 1, but the same as in Fig.
B7.B11_ClockstaR_scriptFile_B11 (plain text). ClockstaR R script (and
output) for the above analysis. Note filenames and paths need to be
changed as appropriate.C1_PartitionFinderFile_C1 (zipped, plain text).
Matrix with molecular data for 46 extant taxa (extracted from [6]), in
Phylip format (mammals.phy); PartitionFinder [12] command file (cfg) with
71 candidate partitions (by genes and by codons, with noncoding genes
treated as single candidate partitions); best scheme with 7 partitions
requiring separate substitution models, found by PartitionFinder using the
Bayesian Information Criterion with unlinked branch lengths.C2_1clock -
MrBayes fileFile_C2 (plain text). MrBayes executable file for a
total-evidence dating analysis of all 86 taxa in the matrix in [6], using
a single relaxed (independent gamma rates) clock for all traits
(morphological and molecular). The sampled ancestor birth-death tree prior
[17], and the Markov model of morphological evolution [18,19], were used.
Optimal substitution models and substitution-model-partitions were found
with PartitionFinder for molecular data (see C1) and with stepping-stone
analysis in MrBayes for morphological data as implemented in [10].
Numerous (>20) MCMC runs were initially performed for 5 million
generations to investigate tuning, mixing and convergence. The final
analysis was then performed with 4 runs of 20 million generations, with
the first 30% of samples discarded as
burnin.C2_1clock.mrbC3_1clock_PostburninTrees&Params_folderFile_C3
(zipped, plain text). The full MCMC tree and parameter output files from
the MrBayes analysis in file C1. Only post-burnin samples are included to
reduce file sizes. Run the MrBayes file in C2 (after disabling the MCMC
command and setting burnin to 0) to generate consensus trees and
statistics from these files.C4_1clock_con_figFile_C4 (approximate nexus
format). The majority-rule consensus tree from the MrBayes analysis in
files C1-2. Note: the wrong file was previously uploaded onto Dryad; I
thank Joseph Brown for pointing this
out.C5_1clock_topologyConvergenceFile_C5 (pdf). Convergence diagnostics
for tree topologies sampled in C2. (A) AWTY [20] plots demonstrating
similar posterior probabilities for all clades across 4 runs, and (B) at
different stages in a single run. The topology convergence statistics from
MrBayes and AWTY (standard deviation of split frequencies across runs)
were also good ie low, being <0.031 across all comparisons (see top
right cells of panel A). These patterns are consistent with good
convergence. The kink in one of the fitted lines appears to be an artefact
of a glitch in AWTY.C6_1clock_paramConvergenceFile_C6 (plain text).
Convergence diagnostics for numerical parameters from MrBayes [9] for the
analysis in Files C2-4. PSRFs (ratio of within-run to between-run
variance) is approximately 1 for all parameters, consistent with the view
the MCMC runs are sampling from the same posterior and consistent with
good convergence [10].C7_8clocks MrBayes fileFile_C7 (plain text). MrBayes
executable file for a total-evidence dating analysis of all 86 taxa in the
matrix in [6], using 8 separate relaxed clocks (independent gamma rates; 7
for morphology and 1 for molecular data), as found in analysis B-3. The
sampled ancestor birth-death tree prior [17], and the Markov model of
morphological evolution [18,19], were used. Optimal substitution models
and substitution-model-partitions were found with PartitionFinder for
molecular data (see C1) and with stepping-stone analysis in MrBayes for
morphological data [10]. Numerous (>20) MCMC runs were initially
performed for 20 million generations to investigate tuning, mixing and
convergence. The final analysis was then performed with 4 runs and the
trees from 10 million post-burnin generations retained Note: The burnin
for each run varies considerably due to variation in time to reach
(apparent) stationarity, from 20 million to 50 million. For computational
efficiency, the 4 runs were performed separately and in all cases, the
last 10 million steps (after stationarity) were retained. However, the
step numbers have been readjusted in file C8 below so they are all
identical across runs (ie to a common burnin of 50 million), to facilitate
downstream analyses e.g. generating summary statistics in
MrBayes.C7_8clocks.mrbC8_8clocks_PostburninTrees&Params_folderFile_C8 (zipped, plain text). The full MCMC tree and parameter output files from the MrBayes analysis in file C7. Only post-burnin samples for each of the 4 runs (last 10 million, see below) are included to reduce file sizes. Run the MrBayes file in C7 (first disabling the MCMC command) and setting burnin to 0 to generate consensus trees and statistics. Run the relevant MrBayes file (disabling the MCMC command) and setting burnin to 0 to generate consensus statistics.C9_8clocks_con_figFile_C9 (approximate nexus format). The majority-rule consensus tree from the MrBayes analysis in files C6-7.C10_8clocks_topolConvergenceFile_C10 (pdf). Convergence diagnostics for tree topologies sampled in C8. (A) AWTY [20] plots demonstrating relatively good correlation (albeit with substantial variance) for all clades across 4 runs, and (A) high variation at different stages in a single run, but no obvious directional trends. The topology convergence statistics from MrBayes and AWTY (standard deviation of split frequencies across runs) were also relatively good ie being <0.093 across all comparisons (see top right cells of panel A). These results are consistent with convergence or near-convergence, i.e. runs sampling similar distributions but cycling very slowly through parameter space.C11_8clocks_param_summaryFile_C11 (plain text). Convergence diagnostics for numerical parameters from MrBayes [10] for the analysis in Files C7-9. PSRFs (ratio of within-run to between-run variance) is very 1 for most numerical parameters, but approaches ~1.7 for a single parameter (due to variance in 1 run). This single outlier is slightly higher than desirable. These diagnostics do not indicate convergence, though are consistent with convergence being approached.C12_NodeAgesFile_C12 (Word docx). Comparison of divergence dates obtained from the single-clock and multi-clock analyses, and two previous studies [6,9]. Numerical dates were not published in [9] so dates were retrieved from a detailed time-tree (Figure S1) and are thus shown as estimated (e.g. ~56).