10.5061/DRYAD.FC827
Lucas, Lauren K.
Utah State University
Nice, Chris C.
Texas State University
Gompert, Zach
Utah State University
Gompert, Zachariah
Utah State University
Data from: Genetic constraints on wing pattern variation in Lycaeides
butterflies: a case study on mapping complex, multifaceted traits in
structured populations
Dryad
dataset
2018
Genomic prediction
genome-wide association mapping
National Science Foundation
https://ror.org/021nxhr62
DEB-1050355
2018-02-27T19:39:05Z
2018-02-27T19:39:05Z
en
https://doi.org/10.1111/1755-0998.12777
1376983917 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Patterns of phenotypic variation within and among species can be shaped
and constrained by trait genetic architecture. This is particularly true
for complex traits, such as butterfly wing patterns, that consist of
multiple elements. Understanding the genetics of complex trait variation
across species boundaries is difficult, as it necessitates mapping in
structured populations and can involve many loci with small or variable
phenotypic effects. Here, we investigate the genetic architecture of
complex wing pattern variation in Lycaeides butterflies as a case study of
mapping multivariate traits in wild populations that include multiple
nominal species or groups. We identify conserved modules of integrated
wing pattern elements within populations and species. We show that trait
covariances within modules have a genetic basis, and thus represent
genetic constraints that can channel evolution. Consistent with this, we
find evidence that evolutionary changes in wing patterns among populations
and species occur mostly in the directions of genetic covariances within
these groups. Thus, we show that genetic constraints affect patterns of
biological diversity (wing pattern) in Lycaeides, and we provide an
analytical template for similar work in other systems.
Genetic data (filtered vcf file)This text file contains the filtered
genetic data (SNP set) in variant call format (vcf). This included genetic
data for 78,567 SNPs.morefilter_filtered2x-70_varsLycGwa.vcf.gzVariant
filtering scriptsThis compressed directory contains three perl scrips used
for filtering and processing the genetic data (i.e., the vcf file).
morefilter_filtered2x-70_varsLycGwa.vcf was generated by running the two
filter scripts. Most filters described in the paper are impelemented in
the main script, vcfFilter.pl, which was run first. The second script,
filterSomeMore.pl applies maximum coverage filters and a filter to drop
variants that are near each other (within 3 bp in this case). The final
scrips, vcf2gl.pl, extracts the genotype likelihoods from the vcf
file.filterScripts.tar.gzVariant calling scriptShell script used for
variant calling with samtools and bcftools.callvar.shAlignment scriptThis
perl script generates shell scripts to submit to a SLURM job scheduler to
run bwa, which we used for DNA sequence alignments. Note that this depends
on having a bwa module installed on a cluster running
SLURM.wrap_qsub_slurm_bwa.plLycaeides melissa reference genomeReference
genome for Lycaeides melissa as a fasta file.final.assembly.fastaLycaeides
melissa linkage mapText file describing the L. melissa linkage map. There
are three columns giving the linkage group (lg), scaffold number (scaf,
these match the reference genome scaffold names), and position (pos) in
centi Morgans along the relevant linkage group.orderedLinkageMap.txtGemma
BSLMM infilesThis compressed directory contains the infiles for the
genomic prediction/genome wide association mapping analysis. These are in
BIMBAM format and include mean genotype (*geno*) and phenotype (*pheno*)
files. We include files for size and position (*coord*) traits. Files are
included for all biological levels we considered and for all groups: AN =
L. anna, GNP = L-ID-GNP, ID = L. idas, JA = Jackson hole, ME = L. melissa
east, MW = L. melissa west, RI = L. anna ricei, SIN = L-ME-SIN, SN =
Sierra Nevada, WA - Warner Mt., and YBG = L-AN-YBG. Files without prefixes
are for the species-comples level analysis. We have also included a perl
wrapper script used to fit the BSLMM in gemma =
forkRunGemmaPop.pl.gemmaInfiles.tar.gzPCA infile and scriptsThis
compressed directory contains the phenotype infile
(resid-sizeANDcoord-6vi17-subgroups-NoNA.csv) used for the PCA, as well as
the PCA script PCA_sizeANDposition-final.R and an additional script
defining the Bayesian model that was used for the 95% PC mean ellipses
(fitEllipse.R).PCA.tar.gzPhenotypic dataThis files contains the wing
pattern data, including area and position measurements. Individual IDs and
groups are given in the first and second column,
respectively.resid-sizeANDcoord-6vi17-subgroups-NoNA.csvGenomic prediction
scriptPerl wrapper script (forks to run multiple jobs) to run the genomic
prediction option in gemma. This is sued after the standard BSLMM has been
fit.forkRunGemmaPrdt.plScripts for processing BSLMM outputThis compressed
directory contains a series of perl scripts used to summarize the output
from gemma's BSLMM. The calpost* scripts summarize the hyperparameter
estimates (e.g., pve, pge, etc.), get Bvs* extract breeding values from
the standard BSLMM (based only on the polygenic term), getPrdtBvs* extract
breeding values from genomic prediction that include SNPs with measurable
effects, and grabCalsEffects* extract and summarize the SNP effect
estimates. All summaries are across MCMC chains.helperScripts.tar.gzQLT
summaries and scriptsThis compressed directory contains three files.
sortedCombinedPips.txt contains the SNP posterior inclusions probabilities
for all traits and biological levels, averaged across MCMC chains. This is
the needed input for the other two files/scripts. pipColocal.R quanties
correlations in PIPs across traits, and plotPipLgs.R runs the QTL
number/density analyses per LG, along with making some
plots.PIPbyLg.tar.gzG- and P-matrix infiles and analysesThis compressed
directory includes the genome estimated breeding values from genomic
prediction (catbv*csv) for each groups, phenotypic data for P-matrixes
(resid*) and a R script that runs analyses on these files, matcomp.R
(which has annotations throughout). The R script runs the comparisons of P
and G-matrixes and the evolvability/constraint analyses, as well as making
related plots.Gmatrix.tar.gz