10.5061/DRYAD.4R7B826
Machado, Heather E.
Wellcome Sanger Institute
Bergland, Alan O.
University of Virginia
Taylor, Ryan
Stanford University
Tilk, Susanne
Stanford University
Behrman, Emily
University of Pennsylvania
Dyer, Kelly
University of Georgia
Fabian, Daniel K.
University of Veterinary Medicine Vienna
Flatt, Thomas
University of Veterinary Medicine Vienna
González, Josefa
Pompeu Fabra University
Karasov, Talia L.
University of Chicago
Kozeretska, Iryna
Taras Shevchenko National University of Kyiv
Lazzaro, Brian P.
Cornell University
Merritt, Thomas JS
Laurentian University
Pool, John E.
University of Wisconsin-Madison
O’Brien, Katherine
University of Pennsylvania
Rajpurohit, Subhash
University of Pennsylvania
Roy, Paula R.
University of Kansas
Schaeffer, Stephen W.
Pennsylvania State University
Serga, Svitlana
Pompeu Fabra University
Schmidt, Paul
University of Pennsylvania
Petrov, Dmitri
Stanford University
Kim, Bernard
Stanford University
Data from: Broad geographic sampling reveals predictable, pervasive, and
strong seasonal adaptation in Drosophila
Dryad
dataset
2019
2021-02-09T00:00:00Z
2021-02-09T00:00:00Z
en
5142669708 bytes
4
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
To advance our understanding of adaptation to temporally varying selection
pressures, we identified signatures of seasonal adaptation occurring in
parallel among Drosophila melanogaster populations. Specifically, we
estimated allele frequencies genome-wide from flies sampled early and late
in the growing season from 20 widely dispersed populations. We identified
parallel seasonal allele frequency shifts across North America and Europe,
demonstrating that seasonal adaptation is a general phenomenon of
temperate fly populations. Seasonally fluctuating polymorphisms are
enriched at large chromosomal inversions and we find a broad concordance
between seasonal and spatial allele frequency change. The direction of
allele frequency change at seasonally variable polymorphisms can be
predicted by weather conditions in the weeks prior to sampling, linking
the environment and the genomic response to selection. Our results suggest
that fluctuating selection is an important evolutionary force affecting
patterns of genetic variation in Drosophila.
VCF of all SNPs (limited filtering): chrX We called SNPs using the program
VarScan v2.3.8 using a p-value of 0.05, minimum variant frequency of
0.005, minimum average quality of 20, and minimum coverage of 10 (Koboldt
et al. 2012). We filtered out SNPs within 10bp of an indel (they are more
likely to be spurious), variants in repetitive regions (identified by
RepeatMasker and downloaded from the UCSC Genome browser) and nucleotides
with more than two alleles. See Supplemental Table 1 for mapping of column
headers to population information.
mel_X_mapping2016.varscanSNP_noindel_dpfilter_biallelic_repeatmask.recode.vcf.gz VCF of all SNPs (limited filtering): chr2L We called SNPs using the program VarScan v2.3.8 using a p-value of 0.05, minimum variant frequency of 0.005, minimum average quality of 20, and minimum coverage of 10 (Koboldt et al. 2012). We filtered out SNPs within 10bp of an indel (they are more likely to be spurious), variants in repetitive regions (identified by RepeatMasker and downloaded from the UCSC Genome browser) and nucleotides with more than two alleles. See Supplemental Table 1 for mapping of column headers to population information. mel_2L_mapping2016.varscanSNP_noindel_dpfilter_biallelic_repeatmask.recode.vcf.gz VCF of all SNPs (limited filtering): chr3L We called SNPs using the program VarScan v2.3.8 using a p-value of 0.05, minimum variant frequency of 0.005, minimum average quality of 20, and minimum coverage of 10 (Koboldt et al. 2012). We filtered out SNPs within 10bp of an indel (they are more likely to be spurious), variants in repetitive regions (identified by RepeatMasker and downloaded from the UCSC Genome browser) and nucleotides with more than two alleles. See Supplemental Table 1 for mapping of column headers to population information. mel_3L_mapping2016.varscanSNP_noindel_dpfilter_biallelic_repeatmask.recode.vcf.gz VCF of all SNPs (limited filtering): chr3R We called SNPs using the program VarScan v2.3.8 using a p-value of 0.05, minimum variant frequency of 0.005, minimum average quality of 20, and minimum coverage of 10 (Koboldt et al. 2012). We filtered out SNPs within 10bp of an indel (they are more likely to be spurious), variants in repetitive regions (identified by RepeatMasker and downloaded from the UCSC Genome browser) and nucleotides with more than two alleles. See Supplemental Table 1 for mapping of column headers to population information. mel_3R_mapping2016.varscanSNP_noindel_dpfilter_biallelic_repeatmask.recode.vcf.gz VCF of all SNPs (limited filtering): chr2R We called SNPs using the program VarScan v2.3.8 using a p-value of 0.05, minimum variant frequency of 0.005, minimum average quality of 20, and minimum coverage of 10 (Koboldt et al. 2012). We filtered out SNPs within 10bp of an indel (they are more likely to be spurious), variants in repetitive regions (identified by RepeatMasker and downloaded from the UCSC Genome browser) and nucleotides with more than two alleles. See Supplemental Table 1 for mapping of column headers to population information. mel_2R_mapping2016.varscanSNP_noindel_dpfilter_biallelic_repeatmask.recode.vcf.gz Allele frequency and dp per SNP Allele frequency and read depth (adjusted for number of individuals in pool) for each sample and SNP. This R object is used in downstream analyses. mel_freqdp_042016_Ne_fixed_correctBAVI.Rdata Matched controls Matched controls, with each SNP matched by chromosome, inversion status, recombination rate, and spring allele frequency. 100 sets of matched controls. bootstrap_fmean_dp.mel.medfreq01_RRgrt0.recRate.polymorphic.txt
We assembled 73 samples of D. melanogaster, 61 representing newly
collected and sequenced samples and 12 representing previously published
samples (Bergland et al., 2014; Kapun et al., 2016). Locations, collection
dates, number of individuals sampled, and depth of sequencing for all
samples are listed in Supplemental Table 1 (Machado et al., 2021). For
each sample, members of the Drosophila Real-Time Evolution Consortium
collected an average of 75 male flies using direct aspiration from
substrate, netting, or trapping at orchards and residential areas. Flies
were confirmed to be D. melanogaster by examination of the male genital
arch. We extracted DNA by first pooling all individuals from a sample,
grinding the tissue together in extraction buffer, and using a lithium
chloride – potassium acetate extraction protocol (see Bergland et al. 2014
for details on buffers and solutions). We prepared sequencing libraries
using a modified Illumina protocol (Bergland et al. 2014) and Illumina
TrueSeq adapters. Paired-end 125bp libraries were sequenced to an average
of 94x coverage either at the Stanford Sequencing Service Center on an
Illumina HiSeq 2000, or at the Stanford Functional Genomics facility on an
Illumina HiSeq 4000. The following sequence data processing was
performed on both the new and the previously published data. We trimmed
low-quality 3’ and 5’ read ends (sequence quality < 20) using the
program cutadapt v1.8.1 (Martin 2011). We mapped the raw reads to the D.
melanogaster genome v5.5 (and for D. simulans genome v2.01, flybase.org)
using bwa v0.7.12 mem algorithms, with default parameters (Li &
Durbin 2009), and used the program SAMtools v1.2 for bam file manipulation
(functions index, sort, and mpileup) (Li et al. 2009). We used the program
picard v2.0.1 to remove PCR duplicates (http://picard.sourceforge.net) and
the program GATK v3.2-2 for indel realignment (McKenna et al. 2010). We
called SNPs and indels using the program VarScan v2.3.8 using a p-value of
0.05, minimum variant frequency of 0.005, minimum average quality of 20,
and minimum coverage of 10 (Koboldt et al. 2012). We filtered out SNPs
within 10bp of an indel (they are more likely to be spurious), variants in
repetitive regions (identified by RepeatMasker and downloaded from the
UCSC Genome browser), and nucleotides with more than two alleles. Because
we sequenced only male individuals, the X chromosome had lower coverage
and was not used in our analysis.