10.5061/DRYAD.S7H44J17V
McCullough, Jenna
0000-0002-7664-3868
University of New Mexico
Oliveros, Carl
University of Kansas
Benz, Brett
University of Michigan–Flint
Zenil-Ferguson, Rosana
University of Hawaii System
Cracraft, Joel
American Museum of Natural History
Moyle, Robert
University of Kansas
Andersen, Michael
0000-0002-7220-5588
University of New Mexico
Wallacean and Melanesian islands promote higher rates of diversification
within the global passerine radiation Corvides: Supplementary information
Dryad
dataset
2021
FOS: Biological sciences
UCE
supermatrix phylogeny
Biogeography
state-dependent diversification
Passerines
Corvides
National Science Foundation
https://ror.org/021nxhr62
DEB 1557053
National Science Foundation
https://ror.org/021nxhr62
DEB 1241181
National Science Foundation
https://ror.org/021nxhr62
DEB 2112467
2022-06-23T00:00:00Z
2022-06-23T00:00:00Z
en
https://doi.org/10.1093/sysbio/syac044
https://github.com/JennaMcCullough/Corvides-SSE
https://doi.org/10.5281/zenodo.7217480
290016000 bytes
7
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
The complex island archipelagoes of Wallacea and Melanesia have provided
empirical data behind integral theories in evolutionary biology, including
allopatric speciation and island biogeography. Yet, questions regarding
the relative impact of the layered biogeographic barriers, such as
deep-water trenches and isolated island systems, on faunal diversification
remain underexplored. One such barrier is Wallace’s Line, a significant
biogeographic boundary that largely separates Australian and Asian
biodiversity. To assess the relative roles of biogeographic
barriers—specifically isolated island systems and Wallace’s Line—we
investigated the tempo and mode of diversification in a diverse avian
radiation, Corvides (Crows and Jays, Birds-of-paradise, Vangas, and
allies). We combined a genus-level dataset of thousands of ultraconserved
elements (UCEs) and a species-level, 12-gene Sanger sequence matrix to
produce a well-resolved supermatrix tree that we leveraged to explore the
group’s historical biogeography and effects of biogeographic barriers on
their macroevolutionary dynamics. The tree is well-resolved and differs
substantially from what has been used extensively for past comparative
analyses within this group. We confirmed that Corvides, and its major
constituent clades, arose in Australia and a burst of dispersal west
across Wallace’s Line occurred after the uplift of Wallacea during the
mid-Miocene. We found that dispersal across this biogeographic barrier
were generally rare, though westward dispersals were two times more
frequent than eastward dispersals. Wallacea’s central position between
Sundaland and Sahul no doubt acted as a bridge for island-hopping
dispersal out of Australia, across Wallace’s Line, to colonize the rest of
Earth. In addition, we found that the complex island archipelagos east of
Wallace’s Line harbor the highest rates of net diversification and are a
substantial source of colonists to continental systems on both sides of
this biogeographic barrier. Our results support emerging evidence that
island systems, particularly the geologically complex archipelagoes of the
Indo-pacific, are drivers of species diversification.
Sampling We generated two datasets: a genus-level matrix of ultraconserved
elements (UCEs; Faircloth et al. 2012), and a species-level supermatrix of
12 Sanger-sequenced genes. We sampled 86 passerine taxa for our UCE
matrix, which comprised 77 species (76 genera) representing 29 of the 30
families within Corvides (IOC v.8.2; Gill and Donsker 2018); we lacked
only the Bornean endemic family, Pityriasidae (Table S1). We diverged from
IOC v.8.2 taxonomy by treating Platylophus, which has been traditionally
considered within Corvidae, as its own monotypic family, Platylophidae,
(Aggerbeck et al. 2014; Winkler et al. 2015; Oliveros et al. 2019). We
also sampled nine outgroups from the following families: Meliphagidae,
Orthonychidae, Pomatostomidae, Cnemophilidae, Petroicidae, Cisticolidae,
Stenostiridae, Passeridae, and Chloropseidae. For our species-level
supermatrix, we included up to 12 Sanger-sequenced gene regions from 728
species, representing 87% of all Corvides species (Table S2–3). The
majority of these data were downloaded from GenBank. For the 86 species in
the UCE matrix, we mined off-target, mitochondrial sequencing reads and
combined these with multilocus nuclear data from GenBank. Finally, we
sequenced de novo RAG-1 and RAG-2 for some individuals (Table S2–3). The
UCE and multilocus datasets are mutually exclusive and are linked by the
86 individuals that were shared between the two datasets. Laboratory
techniques We extracted and purified DNA from fresh muscle or liver tissue
or toepad samples from museum specimens using the Qiagen DNeasy Blood and
Tissue Kit following the manufacturer’s protocol. We quantified DNA
extracts with a Qubit 2.0 Fluorometer and sheared 500 ng of DNA with a
Covaris S220 sonicator (175 W peak incident power, 2% duty factor, and 200
cycles per burst for 45 seconds). Following established protocols
(Faircloth et al. 2012, McCormack et al. 2016), we performed end repair,
A-tailing, adapter ligation, and amplification on sheared DNA using the
Kapa Biosystems Library Prep kit. We deviated from this protocol in three
ways: we used ¼ volume of reagents as called for in the library prep kit,
we ligated universal iTru stubs instead of sample-specific adapters to
permit dual-indexing, and we added a second AMPure XP bead cleanup at 1.0X
volume after stub ligation. We added dual-indexed iTru adapters
(baddna.org) to DNA fragments in a 17-cycle PCR using NEB Phusion
High-Fidelity PCR Master Mix. We combined libraries in pools of eight
equimolar samples and enriched pooled libraries for 24 hours using the
Arbor Biosciences MYbaits kit for Tetrapods UCE-5Kv1, which targets 5,060
UCE loci. After post-enrichment amplification, we quantified libraries
using an Illumina Eco qPCR System and sequenced them on a high output
paired-end run of 100 cycles on an Illumina HiSeq 2500 System at the
University of Kansas Genome Sequencing Core. We multiplexed 96 individuals
on a single Illumina lane, including 10 samples for other projects. UCE
Data Assembly We processed UCE data using the Python bioinformatics
pipeline PHYLUCE (Faircloth 2016). Raw reads were de-multiplexed using
CASAVA v.1.8.2. To trim adapter-sequences and low-quality bases from
reads, we used the parallel wrapper Illumiprocessor v.1 (Faircloth 2013)
around Trimmomatic v.0.32 (Bolger et al. 2014). We assembled cleaned reads
into contigs using Trinity v.r2013.08.14 (Grabherr et al. 2011) and
extracted contigs matching UCE loci. We aligned loci with Mafft v.7.130b
(Katoh and Standley 2013), allowing missing nucleotides at the flanks of
the alignment only if at least 65% of taxa contained data, the default
option in PHYLUCE. We trimmed alignments using Gblocks v.0.91b (Castresana
2000) and default parameters with the exception of the minimum number of
sequences for a flank position in Gblocks, which we set at 65% of taxa. We
generated a complete dataset (composed of loci common to all taxa) and a
75%-complete dataset (for which at least 64/86 taxa were represented per
UCE locus). Phylogenetic Analyses of UCE Data We performed maximum
likelihood (ML) inference on the concatenated loci of the 75%-complete
dataset using RAxML v.8.1.3 (Stamatakis 2014) assuming a general
time-reversible model of rate substitution and gamma-distributed rates
among sites (GTR+G). We evaluated node support using 1000 rapid
bootstraps. Next, we used gene-tree-based coalescent methods (GCM) to
estimate Corvides relationships applying strategies of subsampling taxa,
using only the most informative loci, and separately analyzing samples
with short sequences with trimmed alignments to increase resolution and
consistency between estimates. We performed coalescent analysis on a total
of five subsets of the 86 species. The first subset contained 21 species
that focused on the early branches and major superfamilies in Corvides.
The second subset consisted of the same species but also included Mohoua
albicilla—the only sample that contained substantially lower mean contig
length compared to the rest of the samples—because it was derived from a
toepad sample. Using Gblocks and following best practices of Hosner et al.
(2016), we trimmed this second subset alignment so that no missing data
were present at their flanks (i.e., to match the reduced length of M.
albicilla). The remaining three subsets focused on the three superfamily
clades recovered by Aggerbeck et al. (2014; Fig. 1C). We refer to these
groups as “Orioloidea,” “Malaconotoidea,” and “Corvoidea”, corresponding
to clades X, Y, and Z of Aggerbeck et al. (2014), respectively. In
summary, these last three subsets comprised 25, 25, and 43 species from
Orioloidea, Malaconotoidea, and Corvoidea, respectively. We removed
uninformative loci for GCM analyses based on whether a locus’ number of
parsimony-informative sites and bootstrap support was lower than the mean
for that dataset (Table S8). For coalescent analyses, gene-tree inference
and bootstrapping were performed in RAxML v.8.1.3 (Stamatakis 2014), as
implemented in PHYLUCE (Faircloth 2016). We modified the PHYLUCE scripts
to implement multi-locus bootstrapping (i.e., sampling with replacement of
loci and sites; Seo 2008) and generated 500 multi-locus, bootstrap
replicate sets of gene trees for each dataset. On each replicate set of
gene trees, we performed coalescent analyses using four GCMs: STAR and
STEAC, as implemented in the R package phybase v.1.3 (Liu et al. 2009b),
MP-EST v.1.4 (Liu et al. 2010), and ASTRAL v.4.7.7 (Mirarab et al. 2014).
All programs were run with default options. Species trees inferred from
the bootstrap replicates were summarized with 70% consensus trees using
SumTrees v3.3.1, part of the DendroPy 3.12.0 package (Sukumaran and Holder
2010, 2015). Supermatrix Assembly We assembled a species-level data
matrix using Sanger sequence data available on GenBank. Following the
taxonomy of the International Ornithological Congress World Bird List (IOC
v.8.2; Gill and Donsker 2018), we downloaded sequence data for 12 loci,
which comprised four mitochondrial (COI, Cyt-b, ND2, and ND3) and eight
nuclear (c-mos, Fib-5, GAPDH, Myo2, ODC, RAG-1, RAG-2, and TGFb2) for 728
species (Table S2). These 12 gene regions are widely used in higher-level
avian systematics (Groth and Barrowclough 1999; Hackett et al. 2008; Moyle
et al. 2009a; Tello et al. 2009) and specifically to infer relationships
in Corvides (Cicero and Johnson 2001; Filardi and Moyle 2005; Irestedt et
al. 2008; Nyári et al. 2009; Jønsson et al. 2011, 2016; Slager et al.
2014; Andersen et al. 2015a). We preferentially chose sequences from a
single vouchered specimen and attempted to include as many taxa with
sequence data that were available on GenBank as of September 2018.
Additionally, we incorporated new RAG-1 and RAG-2 sequences (see Barker et
al. 2004 for established amplification and sequencing protocol) and
off-target mitogenome reads of UCE data, following Andersen et
al. (2019) (Table S2). Due to the piecemeal nature of supermatrices, not
all taxa had genetic data for all gene regions. Our dataset averaged 5.2
+/- 3.1 gene regions per taxon. At the low end, ca. 10% of tips (79/728)
were represented by a single gene region, whereas at the high end, ca. 10%
(77/728) were represented by ≥10 gene regions. We aligned gene-specific
datasets using Muscle (Edgar 2004) in Geneious v.R9 (Kearse et al. 2012).
We explored this dataset extensively for errors of sample
misidentification by inspecting alignments by eye and as evidenced by
conflicting gene trees. We used this dataset to estimate a UCE-constrained
topology of all Corvides. Phylogenetic Analysis of Supermatrix and Time
Calibration We used BEAST v2.5.1 (Bouckaert et al. 2014) on the CIPRES
Science Gateway (Miller et al. 2010) to estimate a UCE-constrained, time
calibrated phylogeny of the full 728-taxon Sanger matrix. To constrain the
topology to the UCE backbone, we set up 70 monophyly statements in BEAUti
for well-supported, higher-level nodes in the UCE constraint tree but
allowed species-level relationships to be inferred based on the 12-gene
matrix. In practice, we ensured that tip names of the 86 UCE samples were
identical to their corresponding tips in the supermatrix so that BEAST
recognized them as equals in the constraint analysis. For uncertain nodes
in our backbone dataset, we followed the topology of Oliveros et al.
(2019). Rather than independently calibrate a heterogeneous dataset, we
incorporated secondary calibration points drawn from Oliveros et al.
(2019) on 30 of the monophyly-constrained nodes. Oliveros et al. (2019)
inferred their higher-level Passeriformes tree using 13 fossil
calibrations and a family-level UCE dataset of over 200 songbirds. The
deepest node we calibrated was the split between Foulehaio and all other
taxa (31.1286 Ma) and crown Corvides was calibrated at 25.57 Ma (see
Figure S1 for placement of all monophyly statements and Table S4 for nodes
with secondary calibrations). We conducted nine independent Markov chain
Monte Carlo (MCMC) runs of 107 generations with unlinked site models (GTR)
for each of 12 partitions. We applied an uncorrelated lognormal (UCLN)
clock model and a birth-death tree model, which were linked across
partitions. We assessed convergence of parameter estimates and determined
that ESS values were all > 200 by visualizing trace files in Tracer
v.1.6 (Rambaut & Drummond, 2007). We discarded the first 16–26% of
MCMC generations as burn-in, resulting in 38,834 trees. We sub-sampled
every 4th tree before summarizing the remaining 9,708 trees in a 50%
maximum-clade-credibility tree using LogCombiner v.2.5.1 and TreeAnnotator
v.2.5.1 (Drummond and Rambaut 2007). Historical Biogeography We
investigated the biogeographical history of Corvides using BioGeoBEARS
(Matzke 2013, 2014) in R (R Development Core Team 2013). This program
allows researchers to implement user-defined geographic areas that are
constrained to infer ancestral ranges on a time-calibrated phylogeny in a
likelihood framework (Matzke 2013). We tested the program’s six different
biogeographic models (DEC, DEC+j, DIVA, DIVA+j, BayArea-like, and
BayArea-like+j) and used AIC to assess model fitness, preferring the model
with the lowest AIC score. The dispersal-extinction-cladogenesis (DEC)
model allows for range expansion (dispersal) and contraction (extinction)
along with constant cladogenesis, in which both daughter lineages have
equal probability to inherit the ancestral range. The inclusion of the
founder event parameter (+j) allows for long-distance dispersal, in which
the daughter lineage inhabits a range that the ancestor did not. Though
this parameter has been the subject of criticism as it has a tendency to
over-explain biogeographic variation (Ree and Sanmartín 2018), we
evaluated our results with and without +j. However, we included the
+j parameter because it is important for highly dispersive island
lineages, for which Corvides has many examples (such
as Pachycephala whistlers,Edolisoma cicadabirds, Rhipidura fantails, and
even Chasiempis monarchs and one Corvus that have reached Hawai’i).
We estimated eight biogeographic scenarios, whereby we coded from two to
seven areas globally (Tables S5–6). The 2-area scheme explored dispersal
out of the Sahul Region across Wallace’s Line (from east to west), whereas
our 3-area scheme isolated the Philippines and therefore tested the
effects of Huxley’s Line (Huxley 1868). The 4-area scheme divided the
world west of Wallace’s Line into three areas (Africa, the Palearctic and
Southeast Asia [“Eurasia”], and the Americas), whereas the 5-area scheme
again isolated the Philippines in another exploration of Huxley’s Line
(Esselstyn et al. 2010). The 6-area scheme further divided New Guinea into
its own area, separate from Australia. The 7-area scheme split the oceanic
islands of Wallacea and the Philippines as separate from the Pacific
archipelagos of Melanesia and Polynesia, inclusive of Hawai’i and New
Zealand. We aimed to account for geological evidence that indicates the
emergence of New Guinea as a biogeographically relevant landmass occurred
between 5–12 Ma (Hill and Hall 2003; van Ufford and Cloos 2005; Toussaint
et al. 2014). Therefore, we ran the 6- and 7-area schemes with and without
a time constraint on the orogeny of New Guinea. We set this time
constraint on New Guinea as an ancestral area at 15 Ma, following Moyle et
al. (2016). State-dependent Diversification To assess how geographic
barriers influence diversification within Corvides, we fit eight different
state-dependent speciation and extinction models (SSE; Maddison et al.
2007) in RevBayes (Höhna et al. 2016). SSE models estimate per-lineage
speciation and extinction rate parameters specific to particular traits,
such as geography. We assessed how diversification within Corvides is best
explained by in situ diversification within island/continental systems,
crossing Wallace’s Line, a combination of both, or as an unmeasured
trait. Generally, we tested two traits: island occurrence and
dispersal across Wallace’s Line (Wallace 1863). We categorized islands as
those that were either of oceanic origin (e.g., Solomon Islands, Mascarene
Islands) or continental drift islands that had separated for the duration
of the crown age of Corvides to the present (e.g., Greater Antilles,
Madagascar, New Zealand, and New Caledonia). In terms of the timing of
evolutionary dynamics of Corvides, these continental fragments are
effectively isolated from other landmasses as oceanic islands. However, we
considered continental plate (i.e., land-bridge) islands as “continents.”
This type of island was connected with the mainland by a land-bridge
during cyclical periods of intermittent low sea level caused by
Pleistocene glacial cycles, thus allowing free movement between landmasses
(e.g., New Guinea to Australia, Greater Sundas to mainland Southeast
Asia). Our second trait was whether a taxon was present east or west of
Wallace’s Line (Wallace 1863; Figure S2); taxa that occur on both sides of
Wallace’s Line were considered “widespread.” First, we implemented
GeoHiSSE (Caetano et al. 2018), which is a model extension of geographic
state-dependent speciation and extinction (GeoSSE; (Goldberg et al.
2011)), which incorporates a hidden state to allow for the possibility of
diversification changes that are due to unmeasured factors other than
geography (HiSSE; (Beaulieu and O’Meara 2016)). Incorporation of a hidden
state ameliorates problems with type I error that have been widely
documented for SSE models, including GeoSSE (Alves et al. 2017), by
redefining the null hypothesis to be one that has more complexity in
diversification. We defined three main states in which extant Corvides
taxa are found as either continent (state 0 as defined by figure S2),
island (state 1), or both (i.e., widespread, state 01). For each of the
states, we extended them to include a hidden state (denoted by A or B) to
allow the data to be uncertain to test if diversification within Corvides
is not associated with our measured geographic trait. We added anagenetic
dispersal (d0A, d01A, d1A, d0B, d01B, and d1B), which describes dispersal
from one state to the other, such as a continental taxon dispersing to an
island, without a speciation event. We incorporated cladogenetic
speciation, which includes within region
(s0A, s1A, s0B, and s1B) speciation events. We allowed for extirpation
(X0A, X1A, X0B, X1B) within endemic regions. Therefore, three classes of
parameters (anagenetic dispersal, cladogenetic speciation, and
extirpation) govern the transitions out of states within the GeoHiSSE
model framework. Following Caetano et al. (2018), we did not allow for in
situ speciation (s) or extirpation (X) for “widespread” taxa or dispersal
(d) directly between endemic regions (i.e., transition from state 0
directly to state 1 is not allowed). In addition, we extended the
GeoHiSSE model (referred to as “extended GeoHiSSE”) of Caetano et al.
(2018) to allow separate parameter estimations of the three types of
cladogenetic range evolution that it models: allopatric cladogenesis,
widespread sympatry, and subset sympatry (Fig. 1). It is important to
distinguish here that “sympatry” within the GeoHiSSE model framework is
different from “sympatry” in the ecological sense. Both widespread
sympatry and subset sympatry range evolution relate to coarse areas. For
example, two daughter lineages could occupy the same state in the GeoHiSSE
model (i.e., widespread sympatry; Fig. 1) but are allopatric in nature.
To explore how the combined effect of both biogeographic barriers
(insularity within island systems and Wallace’s Line) affected
diversification within Corvides, we also implemented the Multistate
Speciation and Extinction (MuSSE) model with a hidden state in RevBayes.
We developed MuSSE and MuHiSSE (Nakov et al. 2019) models to infer
diversification within four states: islands east of Wallace’s Line (0),
continents east of Wallace’s Line (1), islands west of Wallace’s Line (2),
and continents west of Wallace’s Line (3). We allowed for in
situ speciation (s), anagenetic transitions between regions (i.e.,
dispersal; d), and “extinction” (X). Unlike GeoHiSSE, the MuSSE/MuHiSSE
model framework does not allow for lineages to occupy the widespread (01)
state. This distinction is important because GeoHiSSE’s extirpation
parameter, which refers generally to an event of range reduction if in the
widespread state, is inherently different than MuSSE/MuHiSSE’s extirpation
parameter, which is confounded with extinction of the lineage. To limit
confusion between these parameters across models, we use the term
extinction for MuSSE/MuHiSSE. For each analysis, we ran between five
and 16 independent chains to speed up computational time; total
generations varied from 5.4x106 to 1.0x109 (Table S7). After assessing
convergence (with ESS values >200), we combined posterior
distributions in LogCombiner v.1.10.4 (Drummond and Rambaut 2007) with 25%
burnin. Code for all RevBayes analyses is available at
github.com/JennaMcCullough/Corvides-SSE and Dryad [#TBD].
This dataset contains: 1 nexus formatted 12-loci
supermatrix alignment of 728 Corvides species and outgroup
representatives. 1 nexus formatted alignment of the
concatenated 75% complete UCE matrix for 86 genera 1 tree file
of the final MCC tree that we present in this study 1 PDF file
that has supplementary methods and results and 23 supplemental figures
1 excel file that has 10 supplemental tables