10.5061/DRYAD.Q7M07
Kanda, Kojun
Oregon State University
Pflug, James M.
Oregon State University
Sproul, John S.
Oregon State University
Dasenko, Mark A.
Oregon State University
Maddison, David R.
Oregon State University
Data from: Successful recovery of nuclear protein-coding genes from small
insects in museums using illumina sequencing
Dryad
dataset
2016
Bembidion
museomics
historic DNA
multi-locus data
reference-based assembly
Carabidae
de novo Assembly
Tenebrionidae
beetle
nuclear protein- coding gene
degraded DNA
next-generation sequencing (NGS)
natural history collections
genome skimming
2016-11-23T00:00:00Z
2016-11-23T00:00:00Z
en
https://doi.org/10.1371/journal.pone.0143929
32576133 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
In this paper we explore high-throughput Illumina sequencing of nuclear
protein-coding, ribosomal, and mitochondrial genes in small, dried insects
stored in natural history collections. We sequenced one tenebrionid beetle
and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that
have been stored in various museums for 4 to 84 years. Although we chose a
number of old, small specimens for which we expected low sequence
recovery, we successfully recovered at least some low-copy nuclear
protein-coding genes from all specimens. For example, in one 56-year-old
beetle, 4.4 mm in length, our de novo assembly recovered about 63% of
approximately 41,900 nucleotides in a target suite of 67 nuclear
protein-coding gene fragments, and 70% using a reference-based assembly.
Even in the least successfully sequenced carabid specimen, reference-based
assembly yielded fragments that were at least 50% of the target length for
34 of 67 nuclear protein-coding gene fragments. Exploration of alternative
references for reference-based assembly revealed few signs of bias created
by the reference. For all specimens we recovered almost complete copies of
ribosomal and mitochondrial genes. We verified the general accuracy of the
sequences through comparisons with sequences obtained from PCR and Sanger
sequencing, including of conspecific, fresh specimens, and through
phylogenetic analysis that tested the placement of sequences in predicted
regions. A few possible inaccuracies in the sequences were detected, but
these rarely affected the phylogenetic placement of the samples. Although
our sample sizes are low, an exploratory regression study suggests that
the dominant factor in predicting success at recovering nuclear
protein-coding genes is a high number of Illumina reads, with success at
PCR of COI and killing by immersion in ethanol being secondary factors; in
analyses of only high-read samples, the primary significant explanatory
variable was body length, with small beetles being more successfully
sequenced.
CarabidAnalysesFinalMainTreesThe NEXUS file written for Mesquite which was
used to test for accurate phylogenetic placement of sequences obtained
from Illumina sequencing of carabid museum specimens in the context of a
larger phylogeny. The file contains matricies for seven genes and the
trees produced by the analyses.BembidionTransversaleGroupAnalysisThe NEXUS
file written for Mesquite we used to test for accuracy of Illumina
sequences obtained from museum specimen carabids in the context of
conspecific and very closely related species. The file contains matricies
and phylogenetic trees for four genes.Lagriinae_AnalyzedMatricesThe NEXUS
file written for Mesquite which was used to test for the accurate
phylogenetic placement of Illumina sequences obtained from museum
tenebrionid specimens in a larger
phylogeny.FocalGene_QuerySet_TriboliumThe query sequences from the
Tribolium castaneum genome which were used to identify fragments of 7
focal genes in reads from Illumina sequenced carabid and tenebrionid
museum specimens.NPG_QuerySet_TriboliumQuery sequences from the Tribolium
castaneum genome which were used to detect the presence of 67 nuclear
protein coding genes in Illumina reads of tenebrionid museum
specimens.Bembidion_transversale_DNA3205_67PCGQuery sequences from
Bembidion transversale genomic reads which were used to detect the
presence of 67 nuclear protein coding genes in Illumina reads of carabid
museum specimens.