10.5061/DRYAD.P25VN
Shokralla, Shadi
Mansoura University
University of Guelph
Gibson, Joel F.
University of Guelph
Nikbakht, Hamid
University of Guelph
Janzen, Daniel H.
University of Pennsylvania
Hallwachs, Winnie
University of Pennsylvania
Hajibabaei, Mehrdad
University of Guelph
Data from: Next-generation DNA barcoding: using next-generation sequencing
to enhance and accelerate DNA barcode capture from single specimens
Dryad
dataset
2014
Genomics/Proteomics
Next-generation sequencing
species identification
2014-01-27T20:11:22Z
2014-01-27T20:11:22Z
en
https://doi.org/10.1111/1755-0998.12236
8744453 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
DNA barcoding is an efficient method to identify specimens and to detect
undescribed/cryptic species. Sanger sequencing of individual specimens is
the standard approach in generating large-scale DNA barcode libraries and
identifying unknowns. However, the Sanger sequencing technology is, in
some respects, inferior to next-generation sequencers, which are capable
of producing millions of sequence reads simultaneously. Additionally,
direct Sanger sequencing of DNA barcode amplicons, as practiced in most
DNA barcoding procedures, is hampered by the need for relatively
high-target amplicon yield, coamplification of nuclear mitochondrial
pseudogenes, confusion with sequences from intracellular endosymbiotic
bacteria (e.g. Wolbachia) and instances of intraindividual variability
(i.e. heteroplasmy). Any of these situations can lead to failed Sanger
sequencing attempts or ambiguity of the generated DNA barcodes. Here, we
demonstrate the potential application of next-generation sequencing
platforms for parallel acquisition of DNA barcode sequences from hundreds
of specimens simultaneously. To facilitate retrieval of sequences obtained
from individual specimens, we tag individual specimens during PCR
amplification using unique 10-mer oligonucleotides attached to DNA
barcoding PCR primers. We employ 454 pyrosequencing to recover full-length
DNA barcodes of 190 specimens using 12.5% capacity of a 454 sequencing run
(i.e. two lanes of a 16 lane run). We obtained an average of 143 sequence
reads for each individual specimen. The sequences produced are full-length
DNA barcodes for all but one of the included specimens. In a subset of
samples, we also detected Wolbachia, nontarget species, and heteroplasmic
sequences. Next-generation sequencing is of great value because of its
protocol simplicity, greatly reduced cost per barcode read, faster
throughout and added information content.
454 Pyrosequencing-plate 17.454Reads-set1 unique names.fasta454
Pyrosequencing-plate 23.454Reads-set2 unique names.fastaSanger sequences
Plate 1Plate 1.fastaSanger sequences Plate 2Plate 2.fastaTree_file