10.5061/DRYAD.3N5TB2RJ8
Ivan, Jeremias
0000-0003-4599-0207
Indonesia International Institute for Life Sciences
Moritz, Craig
Australian National University
Potter, Sally
Australian National University
Bragg, Jason
Royal Botanic Garden Sydney
Turakulov, Rust
National Cancer Institute
Hua, Xia
Australian National University
Temperature predicts the rate of molecular evolution in Australian
Eugongylinae skinks
Dryad
dataset
2021
FOS: Biological sciences
Australian Research Council
https://ror.org/05mmh0f86
FL110100104
2021-08-27T00:00:00Z
2021-08-27T00:00:00Z
en
300439485 bytes
2
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Temperature differences over time and space has been hypothesized to cause
variation in the rate of molecular evolution of species, but empirical
evidence is mixed. To further test this hypothesis, we utilized a large
exon-capture sequence data of Australian Eugongylinae skinks, exemplifying
a radiation of temperature-sensitive ectotherms spanning a large
latitudinal gradient. The association between temperature (and other
species traits) and long-term substitution rate was assessed based on 1268
sequenced exons of 44 species pairs from the Eugongylinae subfamily using
regression analyses. Temperature is the strongest, positively-correlated
predictor of variation in substitution rate across the Australian
Eugongylinae. It explains 45% of variation in synonymous substitution
rate, and 11% after controlling for all the other factors. Synonymous
substitution rate is also negatively associated with body size, with 6%
variation explained by body size after controlling for the effects of
temperature. Other factors are not associated with synonymous substitution
rate after controlling for temperature. Overall, this study points to
temperature as a strong predictor of the molecular evolution rate in the
Eugongylinae subfamily, and demonstrates the power of large-scale exonic
data to identify correlates of the rate of molecular evolution.
Exon sequences, generated using a custom exon-capture system (Bragg et al.
2016) were obtained from Blom et al. (2016), Bragg et al. (2018) and
unpublished data. In brief, the raw sequencing reads were quality-assessed
and assembled de novo as haplotypes for each individual following the
pipelines from Singhal (2013) and Bragg et al. (2016, 2018). We note that
this workflow involves mapping cleaned sequencing reads to a set of
reference sequences that were assembled for the same sample (Bowtie
2, version 2.2.2, Langmead and Salzberg 2012) and calling genotypes
appropriate for the estimation of individual level heterozygosity (using
GATK release gb82c674, McKenna et al. 2010, and sites with a genotype
quality, GQ, greater than 20). We included 435 samples (and 8 outgroups)
from 135 species of Eugongylinae subfamily that had less than 1000 missing
loci out of the initial set of 3171 loci. Using EAPhy v.1.2 (Blom 2015),
we first checked correct amino acid coding for all the aligned haplotypes
(one per individual per locus), removed loci with more than 20% missing
data and/or which had less than 200bp. The filtered sequence alignments (N
= 1782 loci) were further cross-checked by eye using Geneious Prime
2020.1.1 to ensure correct reading frame and no stop codons in the middle
of alignments. Then, the gene tree of each locus was estimated using
IQTree v1.6.11 (Nguyen et al. 2014) with the MG codon model with
additional transition/transversion rate ratio, as selected by
ModelFinder (Kalyaanamoorthy et al. 2017). The gene trees and alignments
were examined in TreeShrink (Mai and Mirarab 2018) to filter out loci in
which some species had abnormally long branches, resulting in 1268 loci
that passed the default parameters. Finally, the alignments that passed
this filter concatenated using EAPhy.
ALL_concat_loci.fasta Sequence alignment of 435 samples (and 8 outgroups)
from 135 species of Eugongylinae subfamily in FASTA format. The README
file contains additional information of the data.