10.5061/DRYAD.7501V9S
Lucek, Kay
University of Basel
Hohmann, Nora
University of Basel
Willi, Yvonne
University of Basel
Data from: Postglacial ecotype formation under outcrossing and
self-fertilization in Arabidopis lyrata
Dryad
dataset
2019
Arabidopsis lyrata subsp. lyrata
divergence-with-gene-flow
soil substrate
self-incompatibility
genome structure
Holocene
2019-01-31T15:29:01Z
2019-01-31T15:29:01Z
en
https://doi.org/10.1111/mec.15035
307745945 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
The process of ecotype formation has been invoked as an important driver
of postglacial biodiversity, because many species colonized heterogeneous
habitats and subsequently experienced divergent selection. Ecotype
formation has been predominantly studied in outcrossing taxa, while far
less attention has been paid to the implications of mating system shifts.
Here we studied the genomic footprint of ecotype formation in Arabidopsis
lyrata subsp. lyrata. The species colonized both rocky and sandy
substrates during its postglacial range expansion, while it also shifted
the mating system from predominantly outcrossing to predominantly selfing
in a number of regions. We performed an association study on pooled
whole-genome re-sequence data of 20 populations, which suggested genes and
gene ontology terms related to substrate adaptation. We validated results
by comparing root growth between plants from the two substrates in a
common environment and found that plants originating from sand –
independent of mating system – grew roots faster and produced more
side-roots, potentially as a response to water limitation in the wild.
Furthermore, we found single nucleotide polymorphisms associated with
substrate-related ecotypes to be more clustered among selfing populations,
presumably due to higher genome-wide linkage disequilibrium. Overall we
show that a shift to selfing could initially facilitate ecotype formation
linked to substrate, likely because selfing reduces effective
recombination.
LD estimate per population using LDxLD estimates based on poolseq data for
20 populations using the program LDx
https://sourceforge.net/p/ldx/wiki/Home/ The program is described here:
Feder AF, Petrov DA, Bergland AO (2012) LDx: Estimation of Linkage
Disequilibrium from High-Throughput Pooled Resequencing Data. PLOS ONE
7(11): e48588. https://doi.org/10.1371/journal.pone.0048588 The output
follows the data format of LDx and is for each column: 1) Location of SNP1
2) Location of SNP2 3) Number of pairs observed with x_11 4) Number of
pairs observed with x_12 5) Number of pairs observed with x_21 6) Number
of pairs observed with x_22 7) Estimate for allele frequency of allele A
8) Estimate for allele frequency of allele B 9) Read depth for SNP1 10)
Read depth for SNP2 11) Intersecting read depth 12) Approx MLE R2 (low end
of interval) 13) Approx MLE estimate 14) Approx MLE (high end of interval)
15) Direction Computation R2 16) allele A 17) allele a 18) allele B 19)
allele b LDx was run for each population separately with the following
settings: perl lds.pl -l 100 -h 500 -q 28 -i 5 -a
0.15LD_estimates_per_population_using_ldx.zipSNP genotypes (VCF)
filesContains three genotype files (VCF) using either all twelve
outcrossing populations, all eight selfing populations or all twenty
populations together. Each VCF file was filtered as described in the
paper. All genotypes were used for the downstream GWAS analysis using the
software BayPass.Genotypes_VCFs.zipLocation of outlier SNPs for each GWAS
analysisContains the location of the top 1% outlier SNPs based on
BaysFactors averaged across ten independent GWAS analyses in the program
BayPass. Provided are five files: i) Outliers using only outcrossing
populations (outcrossing_only_outliers.txt), ii) outliers using only
selfing populations (selfing_only_outliers.txt), iii) outliers using all
20 populations together
(selfing_and_outcrossing_combined_outliers_using_all_populations.txt), iv)
outliers using the combined dataset but only analyzing outcrossing
populations
(selfing_and_outcrossing_combined_outliers_using_only_outcrossing_populations.txt), v) outliers using the combined dataset but only analyzing selfing populations (selfing_and_outcrossing_combined_outliers_using_only_selfing_populations.txt). Each file has two columns – the first is the scaffold, the second is the SNP position.Outlier_SNPs.zipRoot Morphology DataExcel sheet containing the experimental data from a growth experiment on agar plates using individuals that originate from a rock or sand substrate. For each individual root growth and the number of primary side roots were counted through time. The columns describe the following: 1. Substrate where wild type individuals were collected (rock or sand) 2. ID for the replicate run [1 or 2] 3. Agar plate ID [A = Replicate 1, B = Replicate B] 4. Seed Family 5. Population of origin 6. Relative position on agar plate [from left to right] 7. Consecutive number for each measurement for each individual 8. Individual seed ID [Population + Seed Family + Replicate] 9. Population & seed family 10. Mating system [O - Outcrossing, S - Selfing] 11. Phylogenetic Cluster [E - East; W - West] 12. Date measurement was taken in 2017 13. Days since germination 14. Days since germination 15. Number of primary site rootsRoot_morphology_data.xlsxPopulation IDsFile providing the population IDs used in the paper (ID) and the IDs used for the data files (Data ID). The Data ID is consistent with the one used for the genomic data (BioProject: PRJEB19338)population_IDs.txt
North America