10.5061/DRYAD.SN02V6X64
McKinnon, Jeffrey
0000-0003-4547-0279
East Carolina University
Newsome, W. Burns
East Carolina University
Balakrishnan, Christopher
East Carolina University
Gene expression in male and female sticklebacks from populations with
convergent and divergent throat coloration
Dryad
dataset
2022
FOS: Biological sciences
Gene expression
Carotenoid
threespine stickleback
Gasterosteus acculeatus
pigment
RNA-sequencing
National Institute of General Medical Sciences
https://ror.org/04q48ey07
R15GM109291
East Carolina University
2022-04-18T00:00:00Z
2022-04-18T00:00:00Z
en
https://doi.org/10.5281/zenodo.6342438
21981439 bytes
4
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Understanding of genetic mechanisms underlying variation in sexual
dichromatism remains limited, especially for carotenoid-based colors. We
addressed this knowledge gap in a gene expression study with threespine
stickleback. We compared male and female throat tissues across five
populations, including two in which female red coloration has evolved
convergently. We found that the expression of individual genes, gene
ontologies, and coexpression networks associated with red female color
within a population differed between California and British Columbia
populations, suggesting differences in underlying mechanisms. Comparing
females from each of these populations to females from populations
dominated by dull females, we again found extensive expression
differences. For each population, genes and networks associated with
female red color showed the same patterns for males only inconsistently.
The functional roles of genes showing correlated expression with female
color are unclear within populations, whereas genes highlighted through
inter-population comparisons include some previously suggested to function
in carotenoid pathways. Among these, the most consistent patterns involved
TTC39B (Tetratricopeptide Repeat Domain 39B), which is within a known red
coloration QTL in stickleback and implicated in red coloration in other
taxa.
Overview of analyses and contrasts In our main analyses, we used DESeq2
to: identify genes whose expression was correlated with female red
coloration within populations; test whether expression of the same genes
was correlated with female color in different populations; ask if the
genes showing expression correlated with female color also showed
differential expression in analyses including both males and females.
Because some red-associated genes may differ in expression mainly between
populations, we conducted complementary analyses comparing red females
with dull females of different populations in which dull females
predominate. To address similar questions at a systems level, we used
WGCNA to ask if gene networks showing color-correlated differential
expression, in each population possessing red females, were present in the
other population; we conducted an analogous analysis of the two sexes. In
order to identify candidate genes for further study, we asked if any of
the genes consistently associated with red coloration had been linked with
coloration in other species, and if any were associated with stickleback
coloration QTL detected in one of the same populations (Yong et al. 2016).
As a first step before the main analyses, we characterized the overall
structure of the data and broad patterns of variation, by population and
sex, with a principal component analysis of gene expression. Study
populations Fish were collected from populations in British Columbia,
Canada and California, USA. Populations surveyed are Little Campbell
Anadromous (49°00'58 N, 122°46'46 W; henceforth “Anadromous”),
Little Campbell Stream-resident (49°00'43 N, 122°37'30 W),
Salmon River (49°05'29 N, 122°29'34 W), Bonsall Creek drainage
(48°51'05 N, 123°42'40 W), and Matadero Creek (the only
California population: 37°23'36 N, 122°9'46 W). Using minnow
traps and seine/dip nets, fish were collected from the field in spring
2014 and transported to East Carolina University, North Carolina. British
Columbia stickleback were collected in late April and California
stickleback were collected in late June. Fish from Little Campbell
Stream-resident (henceforth LC Stream) and Matadero populations, both
possessing red-females, were sampled so as to ensure high diversity in
female throat coloration, with sufficient numbers of both red and dull
females (which were categorized as red or dull for some analyses). All
fish were held for two weeks to allow acclimation to the lab. They were
kept in 102-liter tanks at approximately 15-20 fish per tank under
natural-spectrum mimicking fluorescent light (Lumichrome® Full Spectrum
Plus, Lumiram Electric Co., Larchmont, NY, USA) and photoperiod at
17–20°C. Fish were fed bloodworms (chironomid larvae) and brine shrimp
twice per day. RNA extraction, library preparation, sequencing, and
generation of read counts Red coloration was assessed following protocols
described in Yong et al. (2013) using an Ocean Optics Maya2000 Pro
spectrophotometer. Prior to collection of spectrometry data, subject fish
were euthanized by MS-222 overdose. Immediately following these
measurements, throat and telencephalon tissue (these results to be
presented in a different MS) were removed and stored in RNAlater. ‘Throat
tissue’ specifically refers to ventral surface tissue between the opercula
and immediately ventral to the ceratohyal cartilage, excluding tissue
immediately ventral to the sternohyoideus; this includes (potentially) red
colored skin tissue on the underside of the head, detached from underlying
cartilage. Additionally, standard length was measured and each fish was
dissected to confirm sex. Remaining tissue samples were immersed in
RNAlater and stored in a -80°C freezer for later use. Animal use protocols
were approved by East Carolina University's Institutional Animal Care
& Use Committee (AUP #D224b) and experiments were performed in
accordance with relevant institutional and national guidelines for the
care and use of laboratory animals. Tissues were homogenized in TRI
reagent with a Bio-Gen 2000 Homogenizer (Pro Scientific) and RNA was
extracted using the TRI Reagent protocol. Samples were then treated with
DNase to remove contaminant DNA, and total RNA isolation was purified
using the manufacturer's instructions for the RNeasy Kit (Qiagen). To
determine RNA quality, RNA samples were run on a RNA 6000 picoLab Chip
using a 2100 Bioanalyzer (Agilent). Average RNA quality for the samples
used in this study was high (RNA integrity number, RIN, = 8.1). RNA was
extracted from throat and brain tissue and stored in a -80°C freezer until
sequencing. cDNA library preparations were conducted by the University of
Illinois Roy J. Carver Biotech Center under the guidance of Dr. Alvaro
Hernandez using Illumina TruSeq SBS sequencing kit version 4. Ninety-six
libraries (52 throat, 44 telencephalon) were multiplexed on a full flow
cell (8 lanes) Illumina HiSeq 2500 to yield approximately 18.8 million
single-end, 100bp reads per library. Fastq files were generated and
demultiplexed with the bcl2fastq v1.8.4 Conversion Software (Illumina).
Following sequencing, standard quality control (adapter removal and
trimming of low-quality bases) was conducted using TrimGalore on default
settings (Martin 2011; Krueger 2015). Reads were mapped to the stickleback
reference genome (Jones et al. 2012) using Tophat version 2.0.13
(Trapnell et al. 2009). Tables of read counts, which were used as input
for most downstream analyses, were generated using HT-Seq version 0.6.0
(Anders et al. 2015). They referenced Ensembl 79. Gene annotations were
from Ensembl 100 unless otherwise indicated. Color measurement and
analysis Reflectance spectra were processed with pavo (Maia et al. 2013)
in the R environment (R Core Team 2017) using a perceptual model of
threespine stickleback vision (Stuckert et al. 2019a) to generate a
measure of chroma, r.vec, which can be thought of as color intensity, and
two indices of wavelength, h.phi and h.theta. The r.vec variable, which in
our analyses is a measure of red intensity and will henceforth be referred
to as chroma, was the intended focus but all three variables proved to be
strongly correlated (see Results). Some heteroscedasticity was present in
chroma data owing largely to limited variance in the consistently low
values of the Bonsall population (note that males in this site appear to
be melanic based on our field observations; we have observed some red
expression in males from this drainage held in the laboratory for an
extended period, but not in males in the present study). This was not
completely eliminated by transformations and significance values were high
and robust, so analyses of raw data, conducted using JMP Pro 14.1.0, are
presented for ease of interpretation. Reflectance data were not available
for one Little Campbell Stream male, but that male’s RNAseq results were
retained in most analyses as other key data were present. Differential
expression analyses To characterize major patterns of variation in
expression data, principal component analyses (PCA) were run using the
plotPCA function of DE-Seq2 version 1.24.0 (Anders and Huber 2010). We
then tested for effects of population and sex using ANOVA, in JMP Pro
14.1.0. ANOVA results for PC1 were checked and confirmed with
non-parametric Wilcoxon tests as some heteroscedasticity was present and
not eliminated by standard transformations. ANOVA results are reported for
consistency. Differential expression analyses were conducted using
DE-Seq2 version 1.24.0 (unless otherwise indicated; Anders and Huber
2010), which tests for differential gene expression using a negative
binomial general linearized model. Effects in DE-Seq2 are presented as
log2 fold change, which for an experiment is log2 (treated/untreated); for
continuous independent variables, as in some of our analyses, the reported
log2 fold change is per unit of change in the continuous variable. Unless
otherwise indicated, alpha values for DESeq2 analyses have been corrected
for multiple testing using the method of Benjamini and Hochberg (1995) and
are denoted as “padj”. We accepted the DESeq2 default independent
filtering of lowly expressed genes, which optimizes the number of adjusted
p-values, resulting in a padj of “NA” for some genes. Uncorrected
significance tests are included in supplemental tables as a complement to
corrected tests. Provisional characterizations of genes not named in
Ensembl are presented when highly significant or otherwise of interest.
Because some populations do not have red females and only one population
lacks red males, our sample design is not balanced. We therefore conducted
our analyses through focused comparisons rather than through a
comprehensive model. 1. Analysis of red females in Matadero and LC Stream
populations We first tested whether expression of the same or different
genes was associated with convergent female red throat coloration within
LC Stream (BC) and Matadero (California) females. Red chroma was treated
as a continuous variable to maximize power. (A) We asked if there were
significant interactions between red chroma and population/region in gene
expression. Genes that show a significant interaction have distinctive
associations with color among populations, as predicted if the genes
differ. If genes are the same, only significant main effects are
expected. (B) We tested for associations between gene expression and
chroma within each of the two populations, after significant interactions
between chroma and population proved common. 2. Comparisons of red females
and red males Next, we tested whether genes whose expression correlated
with red in females (1B above) were also associated with red across the
sexes, as predicted by the genetic correlation hypothesis. Correlations
between chroma and gene expression were analyzed for males and females
together, first with a univariate analysis and then with sex included as a
covariate, and the results compared with those for genes identified in 1B.
The analysis was then repeated with the addition of an interaction term
for sex*red chroma, to test for different relationships between chroma and
gene expression in males versus females. These within-population analyses
were conducted using DESeq2 1.32.0. In addition, we assessed whether in
inter-population contrasts, males and red females of the same populations
differed in expression of the same genes when contrasted with females from
populations lacking red in females (as in 3 below). 3. Gene expression
comparisons of red females with dull females from other populations
Because genes mediating variation in a trait within and between
populations are not necessarily the same, we also conducted an
“inter-population” analysis comparing red females (dull females were
excluded to ensure a meaningful contrast) from each of LC Stream and
Matadero against females from each of two freshwater-resident populations
in which red females are less common (Salmon River) or absent (Bonsall
Creek). That is, we compared red females from LC Stream with dull females
from Salmon and with those from Bonsall, and the same for Matadero
females. While these contrasts are expected to reveal genes that differ by
population independent of color, genes associated with color will also be
encompassed, especially genes that differ in expression between red
females and both of the dull female populations. Functional enrichment
testing Sets of genes differentially expressed with red coloration in
females of each red female population were tested for functional
over-representation using GOrilla (November 25-30, 2020), which tests
rank-ordered gene lists for functional enrichment towards the top of the
list (Eden et al. 2007, Eden et al. 2009). To run these analyses,
stickleback Ensembl gene stable IDs were converted to those for zebrafish,
which for some genes resulted in longer lists owing to zebrafish
possessing multiple members of a gene family that all correspond to a
single stickleback gene stable ID. To avoid artifacts we retained only the
first (ordered by name) in the list of zebrafish genes corresponding to a
stickleback gene; these results are presented. We tested the robustness of
the results obtained by instead retaining only the last gene in the zebra
fish list and rerunning the analysis. Using default GOrilla settings, padj
of 0.05, and the “Process” ontology, the results were identical for the
number of Gene Ontology terms shared between analyses with Matadero and
with LC Stream females. Candidate Gene Annotation and Analysis Because
some pigmentation pathways may not be thoroughly covered by gene ontology
analyses (Baxter et al.2019), we also compared differentially expressed
genes with a list of candidate pigmentation genes associated with
pigmentation/color phenotypes. First, we compiled stickleback genes
orthologous to 650 genes in a manually curated list associated with
integumentary pigmentation phenotypes in humans, mice and zebrafish
(Baxter et al. 2019). We used Ensembl (98; Sep. 29, 2019) to search for
stickleback orthologs, retaining those which Ensemble assigned high
orthology confidence (1 rather than 0). Of the 619 zebrafish genes in the
list, we retained 526 stickleback orthologs; of 30 mouse genes lacking
zebrafish homologs, we retained four; one human gene with neither a
zebrafish nor mouse homologue lacked a stickleback homologue. As a
supplement to this extensively curated list of 530 genes, we used
stickleback orthologs of genes in a color gene list assembled by
Stuckert et al. (2019b), derived from a broader array of taxa but less
extensively annotated and vetted. We first used Ensembl to search for
stickleback orthologs corresponding to gene names. To be thorough, we then
used the list to search for human and zebrafish genes, in turn using those
to search for stickleback orthologs. We retained those with a stickleback
gene name which Ensembl assigned high orthology confidence (1 rather than
0); we merged the three lists and removed duplicate genes. We augmented
this set with genes associated with carotenoid pigmentation based on
Toews et al. 2017, again using Ensembl to identify stickleback genes, a
total of 10, corresponding to carotenoid related
genes BCO1, BCO2 and CYP2J19 (though the latter is not expected to have
direct homologues in fish); we used multiple stickleback genes from the
same subfamilies where one to one orthology could not readily be
established. TTC39B has been found to be pigment related (Hooper et al.
2019; Salis et al. 2019; Ahi et al. 2020a, b) so it was also added (and
its second copy). The Stuckert et al. (2019b)/Toews et al. (2017) lists
added 147 new genes. With Ensembl 100, some gene identifications changed
such that 13 were removed, for a final total of 666 pigment-related genes
(final list in Suppl. Table 1). We describe genes from this list as
candidate pigmentation genes. In addition, genes were assigned to QTL
Bayesian credible intervals from Yong et al. (2016) using the assembly
update of Glazer et al. (2015). WGCNA analyses We conducted network
analyses of RNAseq data using the R package WGCNA 1.68. Weighted gene
correlation network analysis (WGCNA) is a systems biology method that can
be used to find clusters, called modules, of genes with highly correlated
expression (Langfelder and Horvath 2008; Langfelder et al. 2011). We
analyzed networks based on expression data transformed using DESeq2’s
variance-stabilizing transformation (Anders and Huber 2010), with all
genes not expressed in at least 50% of samples removed and a minimum
module size of 30. We tested correlation of modules with red intensity in
R, using the method of Benjamini and Hochberg (1995) to correct for
multiple tests. As with DE-Seq analyses, we used WGCNA to test for
similarities and differences between Matadero and LC Stream populations.
To do this, we built separate networks for each population, including
males and females. Due to small sample sizes for LC Stream, we included
the adjacent (and similar in overall gene expression) Salmon River
population for network construction. The resulting modules were tested for
associations with color (in both sexes and in females only). All modules,
including color-associated modules, were then tested for preservation of
network structure among populations. It is important to keep in mind that
both males and females were included in these WGCNA analyses, when
interpreting relationships between WGCNA modules and traits in a single
sex, and that sample sizes were relatively small for this method. We also
tested for network preservation between sexes to identify whether genes
associated with color in females were preserved in males. To do this
required larger sample sizes, so we built another gene co-expression
network for all of the British Columbia fish together. Using these
modules, we tested for module preservation of color-associated modules
between males and females.
Reflectance data are provided in individual files generated by the
spectrometer and saved in a compressed folder, “Files1029,19Analyses.zip”.
Population and sex information are provided in file names as described for
the column titles in the main data file. An R script,
“McKinnon_R_code_Reflectance_Analyses”, is provided for generating chroma
measures from these data using a model of stickleback visual perception
based on cone data provided in the file,
“stickleback.visual.sensitivityJSM1030,19.csv”. Reflectance data are
lacking for one Little Campbell stream resident male, as described in the
MS and indicated in R code for the man analyses. The chroma and other
values generated were initially analyzed in JMP as described in the MS.
They are also included in the R scripts for the analyses of chroma by
population and sex. The core data set is
“Complete_Throat_Count_Table0328,19”, which provides expression data of
22,456 genes for 51 threespine stickleback from Western North American
populations. The R code that accompanies the data set
(McKinnon_R_code_DESeq2_Analyses) assigns population, sex, and color
measurements to each individual but the sex and population of each fish
can also be inferred from the ID code at the top of each column. The first
letter indicates the population: A-anadromous (from the lower Little
Campbell); L-Little Campbell stream-resident; S-Salmon River; B-Bonsall;
M-Matadero. The second letter indicates sex: M-male; F-female. Principal
component analysis scripts are provided in:
McKinnon_R_code_PrincipComponent_Analyses The principal components
themselves were analyzed in JMP as described in the MS. DESeq analysis
scripts are provided in McKinnon_R_code_DESeq2_Analyses. Data used in the
WGCNA analyses of population are samples from specific populations that
came from the main date set: Complete_Throat_Count_Table0328_19.txt, after
variance stabilization in DESeq2 and removal of genes not expressed in at
least 50% of samples or expressed in the Matadero population. For the
WGCNA analyses by population (R scripts in
McKinnon_R_code_WGCNA_PopAnalyses) the resulting data files were:
stickleback_vsd.throats,HalfPlusExprsdMAT,ZerosRmvd120319.csv
stickleback_vsd.throats,HalfPlusExprsdLC,Salm,MATZerosRmvd120619.csv Those
scripts also required (for preliminary analyses in WGCNA) information
about traits in those populations, which were in:
traits.throatsMats101819.csv, traits.throatsLCSalm120619.csv Data used in
the WGCNA analyses of sex are samples of each sex from British Columbia
populations that came from the main date set:
Complete_Throat_Count_Table0328_19.txt, after variance stabilization in
DESeq2 and removal of genes not expressed in at least 50% of samples. The
resulting data files (R scripts in McKinnon_R_code_WGCNA_SexAnalyses_ )
were: stickleback_vsd.throats042619_HalfPlusExprsdNoMATS_Males120819.csv
stickleback_vsd.throats042619_HalfPlusExprsdNoMATS_Females120819.csv Those
scripts also required (for preliminary analyses in WGCNA) information
about traits in those populations, which were in:
traits.throats0429_19NoMATS_Males120819.csv,
traits.throats0429_19NoMATS_Females120819.csv