10.5061/DRYAD.JN8NT
Costa, Igor Rodrigues
Federal University of Rio de Janeiro
Prosdocimi, Francisco
Federal University of Rio de Janeiro
Jennings, W. Bryan
Federal University of Rio de Janeiro
Data from: In silico phylogenomics using complete genomes: a case study on
the evolution of hominoids
Dryad
dataset
2017
speciation times
anchored enrichment loci
neutral loci
multi-locus coalescent analyses
Homo Sapiens
Pan troglodytes
UCE loci
Gorilla gorilla
hominoids
anonymous loci
independent loci
Pongo abelii
ancestral population size
Miocene
2017-07-14T00:00:00Z
2017-07-14T00:00:00Z
en
https://doi.org/10.1101/gr.203950.115
5021874 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
The increasing availability of complete genome data is facilitating the
acquisition of phylogenomic datasets, but the process of obtaining
orthologous sequences from other genomes and assembling multiple sequence
alignments remains piecemeal and arduous. We designed software that
performs these tasks and outputs anonymous loci (AL) or anchor loci
(AE/UCE) datasets in ready-to-analyze formats. We demonstrate our program
by applying it to the hominoids. Starting with human, chimpanzee, gorilla,
and orangutan genomes, our software generated an exhaustive dataset of 292
ALs (~1 kb each) in ~3 hours. Analyses of our AL dataset not only
validated the program by yielding a portrait of hominoid evolution in
agreement with previous studies, but the accuracy and precision of our
estimated ancestral effective population sizes and speciation times
represent improvements. We also used our program with a published set of
512 vertebrate-wide AE 'probe' sequences to generate datasets
consisting of 171 and 242 independent loci (~1 kb each) in 11 and 13
minutes, respectively. The former dataset consisted of flanking sequences
500 bp from adjacent AEs, while the latter contained sequences bordering
AEs. Although our AE datasets produced the expected hominoid species tree,
coalescent-based estimates of ancestral population sizes and speciation
times based on these data were considerably lower than estimates from our
AL dataset and previous studies. Accordingly, we suggest that loci
subjected to direct or indirect selection may not be appropriate for
coalescent-based methods. Complete in silico approaches, combined with the
burgeoning genome databases, will accelerate the pace of phylogenomics.
AE171_data_filesThis folder contains AE171 data saved into different file
formats: AE171_concatenated.phylip contains all 171 AE loci concatenated
into a supermatrix for species tree analyses. AE171_fasta singles folder
contains 171 AE loci as individual fasta-formatted files.
AE171_multi.nexus file contains 171 AE loci together in a single
nexus-formatted file with BEST and MrBayes commands for species tree
analyses. AE171_multi.phylip file contains 171 AE loci together in a
single phylip-formatted file (file used in BP&P2.2 analyses.
AL171_nexus_singles folder contains 171 AE loci as individual
nexus-formatted files.AE242_data_filesThis folder contains AE242 data
saved into different file formats: AE242_concatenated.phylip contains all
242 AE loci concatenated into a supermatrix for species tree analyses.
AE242_fasta singles folder contains 242 AE loci as individual
fasta-formatted files. AE242_multi.nexus file contains 242 AE loci
together in a single nexus-formatted file with BEST and MrBayes commands
for species tree analyses. AE242_multi.phylip file contains 242 AE loci
together in a single phylip-formatted file (file used in BP&P2.2
analyses. AL242_nexus_singles folder contains 242 AE loci as individual
nexus-formatted files.AL292_data_filesThis folder contains AL292 data
saved into different file formats: AL292_concatenated.phylip contains all
292 loci concatenated into a supermatrix for species tree analyses.
AL292_fasta singles folder contains 292 single-copy, presumably neutral,
and independent anonymous loci as individual fasta-formatted files.
AL292_multi.nexus file contains 292 single-copy, presumably neutral, and
independent anonymous loci together in a single nexus-formatted file with
BEST and MrBayes commands for species tree analyses. AL292_multi.phylip
file contains 292 single-copy, presumably neutral, and independent
anonymous loci together in a single phylip-formatted file (file used in
BP&P2.2 analyses. AL292_nexus_singles folder contains 292
single-copy, presumably neutral, and independent anonymous loci as
individual nexus-formatted files.HKY208_data_filesThis folder contains
HKY208 data saved into different file formats: HKY208_multi.phylip
contains all 208 AL loci found in the AL292 dataset, which fit the HKY
model of substitution. These HKY-only data were analyzed separately to
evaluate the effects of model selection on our results.
HKY208_nexus_singles folder contains 208 single-copy, presumably neutral,
and independent anonymous loci that fit the HKY model as individual
nexus-formatted files.
Africa