10.5061/DRYAD.63XSJ3TZ8
Schwartz, Tonia S
0000-0002-7712-2810
Auburn University
Waits, Damien S
Auburn University
Simpson, Dasia Y
Auburn University
Sparkman, Amanda M
Westmont College
Bronikowski, Anne M
Iowa State University
The utility of reptile blood transcriptomes in molecular ecology
Dryad
dataset
2019
Reptiles
James S. McDonnell Foundation
https://ror.org/03dy4aq19
220020353
National Science Foundation
https://ror.org/021nxhr62
1560115
2019-11-07T00:00:00Z
2019-11-07T00:00:00Z
en
https://doi.org/10.1111/1755-0998.13110
1552600097 bytes
3
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Reptiles and other non-mammalian vertebrates have transcriptionally active
nucleated red blood cells. If blood transcriptomes can provide
quantitative data to address questions relevant to molecular ecology, this
could circumvent the need to euthanize animals to assay tissues. This
would allow longitudinal sampling of animals’ responses to treatments, as
well as sampling of protected taxa. We developed and annotated blood
transcriptomes from six reptile species. We found on average 25,000
proteins are being transcribed in the blood, and there is a CORE group of
9,282 orthogroups that are found in at least four of six species. In
comparison to liver transcriptomes from the same taxa, approximately
two-thirds of the orthogroups were found in both blood and liver; and a
similar percentage of ecologically relevant gene groups (insulin and
insulin-like signaling, electron transport chain, oxidative stress,
glucocorticoid receptors) were found transcribed in both blood and liver.
As a resource, we provide a user-friendly database of gene ids identified
in each blood transcriptome. Although, on average 37% of reads mapped to
hemoglobin, importantly, the majority of non-hemoglobin transcripts had
sufficient depth (e.g., 97% at >10 reads) to be included in
differential gene expression analysis. Thus, we demonstrate that RNAseq
blood transcriptomes from a very small blood sample (<10 ul) is a
minimally invasive option in non-mammalian vertebrates for quantifying
expression of a large number of ecologically relevant genes longitudinally
and in protected populations.
Taxon Sampling Six reptile species (3 snakes, 2 lizards, and a turtle)
were included in this study (Table 1) for development of blood
transcriptomes. These were chosen based on reptile taxonomic diversity and
interest to our research groups. Blood was taken using a heparinized
needle (U-100 BD Micro-Fine™ IV Insulin Syringes 28 Gauge, 1 mL 12.7 mm
(1/2")). For all animals <1% of body weight of blood was taken
from the caudal vein in the tail, this was (20 ml to 300 ml for these
animals). Blood was either (1) put immediately into RNAlater (Ambion)
(<1:5 ratio of blood to RNAlater) in a 2 ml screw-cap tube and kept
on ice as would be typical in field settings until stored 4°C, or (2)
centrifuged (1000xG for 5 min) to separate blood components, which were
flash frozen in liquid nitrogen as plasma and red blood cells in 2 ml
screw-cap tubes and stored at -80°C as is common in laboratory settings.
These are two approaches (RNAlater versus snap-freeze in liquid nitrogen)
that are commonly used to preserve the RNA in tissue, and this study
demonstrates that both produce high quality RNA and RNAseq data when used
on blood. Typically, the blood cell pellet (hematocrit) is approximately
one half of the whole blood cell volume, thus 10 ml of blood cells is
obtainable from 20 ml of whole blood. Typically, the blood cell pellet
from reptiles is almost all red blood cells with a fine layer of white
blood cells at the interface between the red blood cells and the plasma,
and thereby is referred to as the red blood cell (RBC) pellet. All
procedures were approved by the IACUC at the respective universities or
agency of the individual collecting the sample. RNA Isolation Blood that
was in RNAlater was centrifuged at 1000xG for 5 minutes to pellet the RBC
and the RNAlater was pipetted off. From either the snap-frozen RBC pellet,
or the RNAlater RBC pellet, we used 10ml of pelleted blood cells for RNA
isolation using the Ambion RiboPure Kit, with DNAse digestion as described
by the manufacturer. Purified RNA was analyzed on a Bioanalyzer (Agilent)
to validate the quality and to quantify of RNA. From these 10 ml of blood
cells we obtained between 4.5 mg and 8.9 mg of RNA, far more than was
needed for RNAseq. All samples had a RIN >7.5. RNA-seq Library
Preparation and Sequencing We sent 1 mg of total RNA to the Heflin Genomic
Center at the University of Alabama at Birmingham. Barcoded libraries were
prepared using the Agilent SureSelect Stranded library kit (Agilent
Technologies, Santa Clara, CA) as described by the manufacturers. Briefly,
100ng of total RNA was subjected to two rounds of poly A+ selection using
oligo dT magnetic beads. The mRNA was randomly fragmented, and first
strand cDNA synthesis was performed in the presence of random hexamers and
2.4ng/µL (final concentration) of Actinomycin D using standard techniques.
After second strand synthesis was complete, the cDNA was adenylated and
used in a ligation reaction to add primary adaptors for flow cell
attachment with bar code information. The sequencing libraries were mixed
to equal molar amounts and run on the Illumina HiSeq2500 using a Rapid Run
flow cell with paired-end 100 bp sequencing reads, aiming for 20 million
reads/sample. Following completion of the run, the .bcl files were
converted to FASTQ file format using BCL2FASTQ 1.8.4 from Illumina. Liver
Transcriptomes For comparison to our blood transcriptomes, we downloaded
the liver transcriptomes from Dryad (McGaugh et al., 2015a) for two
species that overlap with our blood transcriptomes (T. elegans, E.
multicarinata), and a third species that shared a genus (Sceloporus
undulatus) (Table 1). Blood Transcriptome Assembly The bioinformatic
pipeline is represented in Figure 1. FASTQ files were assessed using
FastQC (http://bioinformatics.babraham.ac.uk/projects/fastqc/) to assess
quality control before cleaning. Using Trimmomatic (Bolger, Lohse,
& Usadel, 2014), low quality base pairs were removed from raw
reads. To reduce biases the first 10 base pairs of each read were removed
from each read, and any sequences shorter than 30 base pairs were removed.
Quality of the reads was assessed again using FastQC. Transcriptomic reads
were de novo assembled using Trinity 2.2.0 (Haas et al., 2013) with the
default parameters, we refer to this as the Raw Assembly. Metagenomic
Contamination Screening Contamination screening was performed on the
contigs from each assembled blood transcriptome and the liver
transcriptomes from McGaugh et al. (2015a). We performed DIAMOND
(Buchfink, Xie, & Huson, 2015) blastp searches against NCBI’s
non-redundant nucleotide database (e-value cutoff of 1E-10) and sorted the
resulting reports by bitscore, then e-value, then percent identity and
isolated any sequences whose top hit matched to a non-vertebrate from the
reports using a custom perl script. The contaminate non-vertebrate contigs
were cleaned (removed) from the raw transcriptomes, and we refer to these
as the “cleaned transcriptomes”. Reference Blood Transcriptomes After
removing the non-vertebrate sequences from the original blood
transcriptome assemblies, we used TransDecoder
(https://github.com/TransDecoder/) to generate longest open reading frames
and peptide files. The longest open reading frames were passed to the
UCLUST algorithm implemented in usearch7 (Edgar, 2010) to cluster the
transcripts within each transcriptome using an identity threshold of 90%.
Resulting centroids were kept as representative sequences for each
cluster. These centroids from the cleaned-clustered assemblies we refer to
as the Reference Blood Transcriptomes. Functional Annotation Both the raw
and the reference transcriptome assemblies were annotated with the
Trinotate annotation pipeline version 3.0. (Bryant et al., 2017), which
used TransDecoder to identify the longest open reading frame peptide
candidates and compares them to Swiss-Prot (Bateman et al., 2017), PFAM
(Finn et al., 2016), SignalP (Petersen, Brunak, Heijne, & Nielsen,
2011), TMHMM (Sonnhammer, von Heijne, & Krogh, 1998) databases. We
also did custom BLAST searches (blastx and blastp, e-value cutoff of
1E-10) (Altschul, Gish, Miller, Myers, & Lipman, 1990) to genomes:
Anolis carolinensis 2.0; Gallus gallus 5.0; and Homo sapiens GRCh38.p12
from ENSEMBL release 92 (Zerbino et al., 2018). These transcriptomes and
annotation files are provided as a Dryad Repository. Additionally,
translated transcripts were checked for completeness using the BUSCO
tetrapoda database (Simao, Waterhouse, Ioannidis, Kriventseva, &
Zdobnov, 2015). TransDecoder peptide files from both cleaned blood
transcriptomes and the cleaned liver transcriptomes were passed to
OrthoFinder (Emms & Kelly, 2015) for orthology inference using
all-vs-all blastp searches. To assign similarity-based protein
identifications of resulting putative homologous proteins, we performed
BLAST searches to the three genomes noted above.
This package contains the assmbled transcriptomes from blood (.fastq) and
annotation files for the blood transcriptomes and liver transcriptomes
from McGaugh et al. 2015 PNAS (.xls), and two supplemental .xls files.
Supplemental File 1 is an excel file of the Candidate Functional Pathway
Genes and the transcripts IDs (from translated raw transcriptomes) from
each species. Supplemental File 2 serves as an excel database listing the
genes found in each transcriptome as a resource for researchers
considering using blood transcriptomes to investigate which genes and
candidate gene groups are being expressed and thereby may be assayed in
their reptile system. Raw Blood RNAseq data has been deposited in the
NCBI SRA database. Accession SRP135786: Runs SRR6841717 to SRR6841722