10.5061/DRYAD.5DV41NS4N
Gugger, Paul
0000-0002-4464-8453
University of Maryland Center For Environmental Sciences
Fitz-Gibbon, Sorel
0000-0001-7090-5719
University of California Los Angeles
Albarrán-Lara, Ana
University of California Los Angeles
Wright, Jessica
Agricultural Research Service
Sork, Victoria
University of California Los Angeles
Filtered SNP tables - Rangewide, Hamilton, Tejon, and Madera transects
Dryad
dataset
2020
FOS: Biological sciences
National Science Foundation
https://ror.org/021nxhr62
IOS-1444661
US Forest Service
https://ror.org/03zmjc935
UC-MEXUS/CONACyT*
University of California, Los Angeles
https://ror.org/046rm7j60
UC-MEXUS/CONACyT
2020-10-20T00:00:00Z
2020-10-20T00:00:00Z
en
13294481 bytes
2
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Understanding how the environment shapes genetic variation provides
critical insight about the evolution of local adaptation in natural
populations. At multiple spatial scales and multiple geographic contexts
within a single species, such information could address a number of
fundamental questions about the scale of local adaptation and whether or
not the same loci are involved at different spatial scales or geographic
contexts. We used landscape genomic approaches from three local
elevational transects and range-wide sampling to 1) identify genetic
variation underlying local adaptation to environmental gradients in the
California endemic oak, Quercus lobata, 2) examine whether putatively
adaptive SNPs show signatures of selection at multiple spatial scales, and
3) map putatively adaptive variation to assess the scale and pattern of
local adaptation. Of over 10k single-nucleotide polymorphisms (SNPs)
generated with genotyping-by-sequencing, we found signatures of natural
selection by climate or local environment at over 600 SNPs (536 loci),
some at multiple spatial scales across multiple analyses. Candidate SNPs
identified with gene–environment tests (LFMM) at the range-wide scale also
showed elevated associations with climate variables compared to the
background at both range-wide and elevational transect scales with
gradient forest analysis. Some loci overlap with those detected in other
oak species, raising the question of whether the same loci might be
involved in local climate adaptation in different congeneric species that
inhabit different geographic contexts. Mapping landscape patterns of
adaptive versus background genetic variation identified regions of marked
local adaptation and suggests nonlinear association of candidate SNPs and
environmental variables. Taken together, our results offer robust evidence
for novel candidate genes for local climate adaptation at multiple spatial
scales.
Sampling and processing. In fall of 2012, leaf samples from 436 adult
trees were collected from throughout the range of valley oak and separated
from each other by at least 50 m (Fig. 1; Table S1). Within this
range-wide sampling, three areas were more intensively sampled along
elevational gradients (subsets of the 436 total samples). The Hamilton
transect up Mt. Hamilton spans 12 km and includes 30 samples from 383–1112
m elevation; the Madera transect from the eastern Central Valley near the
town of Madera to the Sierra Nevada foothills spans 69 km and includes 45
samples from 78–607 m; and the Tejon transect from the southern Central
Valley up Cordon Ridge in Tejon Ranch spans 17 km and includes 52 samples
from 361–1766 m elevation (Fig. 1, S1, and S2). Fresh leaf samples were
stored on ice until arriving at UCLA, where they were stored at −80°C.
Total genomic DNA was extracted from approximately 50 mg of frozen tissue
using the Qiagen DNeasy Plant Mini Kit with an added “prewash” step to
minimize secondary compounds in the final product (Gaddis, Zukin,
Dieterich, Braker, & Sork, 2014). The prewash was as follows:
after bead-based tissue grinding under liquid N2, 1 mL of prewash buffer
was added, the mixture was shaken for 10–20 s at 30 Hz and then spun for
10 min at 10,000 rpm in a microcentrifuge, the supernatant was discarded,
and finally the pellet was processed with the standard DNeasy protocol
starting at the Buffer AP1/RNase A addition step. The prewash buffer was
composed of 100 mM Tris-HCl (from pH 8.0 stock), 50 mM EDTA, 1 M NaCl,
0.01 g/mL polyvinylpyrrolidone (PVP) (k-30, SABC), and 4.5 μL/mL
2-mercaptoethanol (added immediately prior to use). Genotyping by
sequencing. Total genomic DNA was prepared for sequencing using an
efficient restriction-enzyme-based approach commonly known as genotyping
by sequencing (GBS) (Elshire et al., 2011). GBS is a cost-effective method
that has proven fruitful in population and landscape genomic studies with
related goals (Gugger, Liang, Sork, Hodgskiss, & Wright, 2018;
Martins et al., 2018; Parchman, Jahner, Uckele, Galland, & Eckert,
2018). Briefly, DNA was digested with a restriction enzyme, common and
unique barcoded adapters with overhangs complementary to the cut site were
ligated to each sample, samples were pooled in equimolar ratios, and the
pooled library was PCR-amplified and sent for Illumina sequencing. We
largely followed the original protocol, including using the same adapter
sequences, adapter concentration (0.036 ng/µL of each adapter), and
restriction enzyme (ApeKI). However, we pooled 48 samples per library
prep/sequencing lane rather than 96; adapters were added during the
ligation step rather than prior to restriction digestion; AMPure XP
bead-based size selection/purification steps were added after the ligation
step and repeated after the PCR step to ensure a consistent distribution
of fragment sizes between 200 and 500 bp (including adapters) among all
preps; and we reduced the number PCR cycles to 16 from 18. Final libraries
were checked for the proper size distribution on an Agilent BioAnalyzer
with the High Sensitivity DNA assay and quantified using a Qubit
fluorometer. Libraries were sent to the UCLA Broad Stem Cell Research
Center for single-end, 100-bp sequencing on an Illumina HiSeq2000 v3. SNP
calling. Illumina reads were demultiplexed and quality filtered using
Stacks 1.28 to 1.41 (Catchen, Amores, Hohenlohe, Cresko, &
Postlethwait, 2011) process_radtags, which removed adapter sequence with
up to two mismatches (adapter_mm), recovered barcodes with up to one
mismatch to the expected barcodes (r), removed any read with an uncalled
base (c), discarded low quality reads as defined by default settings (q),
and trimmed all reads to 92 bp (t). Using BWA 0.7.12 (Li & Durbin,
2010), the filtered reads were aligned to the Quercus lobata reference
genome version 0.5 (NCBI accession # ID 308314, also available at
http//valleyoak.ucla.edu; (Sork, Fitz-Gibbon, et al., 2016)). We used GATK
3.7 (DePristo et al., 2011) to identify SNPs in each aligned sample using
a minimum Phred-scaled confidence threshold of 30. We then used
“VariantFiltration” and “SelectVariants” tools in GATK to exclude low
quality variants, applying the following filters: QD < 10.0
(quality by depth), DP < 4 (genotype read depth), and ExcessHet
> 100. We used VCFtools 0.1.15 (Danecek et al., 2011) to filter the
SNPs to include only diallelic sites, present in at least 90% of
individuals, minor allele frequency (MAF) ≥ 0.10, and mean depth of
coverage across all samples < 80× to filter organellar DNA and
sites aligning to genomic regions that may have been overcollapsed in the
reference genome. Finally, SNPs were pruned in plink 1.90b4.9 (Chang et
al., 2015) to remove those exhibiting high linkage disequilibrium defined
by using a 5 kb window, sliding 5 SNPs, and removing a SNP from all pairs
with r2 > 0.5. The above filtering was performed four times: once
for all samples together to generate a range-wide data set and then once
for each set of samples belonging to a transect. Our filtering strategy
maximizes the number of high-quality SNPs that can be meaningfully
analyzed in each data set.