10.5061/DRYAD.4QRFJ6QBM
Szpiech, Zachary
0000-0001-6372-8224
Pennsylvania State University
DeGiorgio, Michael
0000-0003-4908-7234
Florida Atlantic University
A spatially aware likelihood test to detect sweeps from haplotype distributions
Dryad
dataset
2021
FOS: Biological sciences
National Science Foundation
https://ror.org/021nxhr62
DBI-2130666
Foundation for the National Institutes of Health
https://ror.org/00k86s890
R35GM128590
National Science Foundation
https://ror.org/021nxhr62
DEB-1949268
National Science Foundation
https://ror.org/021nxhr62
BCS-2001063
Pennsylvania State University Startup Funds
2022-06-01T00:00:00Z
2022-06-01T00:00:00Z
en
https://doi.org/10.1101/2021.05.12.443825
https://doi.org/10.1371/journal.pgen.1010134
112127478548 bytes
5
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
The inference of positive selection in genomes is a problem of great
interest in evolutionary genomics. By identifying putative regions of the
genome that contain adaptive mutations, we are able to learn about the
biology of organisms and their evolutionary history. Here we introduce a
composite likelihood method that identifies recently completed or ongoing
positive selection by searching for extreme distortions in the spatial
distribution of the haplotype frequency spectrum along the genome relative
to the genome-wide expectation taken as neutrality. Furthermore, the
method simultaneously infers two parameters of the sweep: the number of
sweeping haplotypes and the “width” of the sweep, which is related to the
strength and timing of selection. We demonstrate that this method
outperforms the leading haplotype-based selection statistics, though
strong signals in low-recombination regions merit extra scrutiny. As a
positive control, we apply it to two well-studied human populations from
the 1000 Genomes Project and examine haplotype frequency spectrum patterns
at the LCT and MHC loci. We also apply it to a data set of brown rats
sampled in NYC and identify genes related to olfactory perception. To
facilitate use of this method, we have implemented it in user-friendly
open source software.
These data comprise all files pertaining to power simulations and real
data analysis examples for the saltiLASSI method for detecting selective
sweeps in population genomic data.
power_sims.tar.gz - Contains all scripts necessary for performing
simulations and evaluating power for all statistics in the manuscript
NYC_rats.tar - contains the raw results from running the saltiLASSI method
on the NYC brown rats data set TGP_humans.tar - contains the raw results,
matched demographic simulations, and processing scripts for the CEU and
YRI data set scripts.tar - contains scripts for processing and plotting
results from both the human and rats data analyses