10.5061/DRYAD.G14K2
Alberdi, Antton
University of Copenhagen
Aizpurua, Ostaizka
University of Copenhagen
Gilbert, M. Thomas P.
National Taiwan Normal University
University of Copenhagen
Curtin University
Bohmann, Kristine
University of East Anglia
University of Copenhagen
Data from: Scrutinizing key steps for reliable metabarcoding of
environmental samples
Dryad
dataset
2018
High throughput sequencing
Metabarcoding primers
Myotis emarginatus
Rhinolophus mehelyi
Rhinolophus ferrumequinum
Primer bias
Myotis myotis
taxonomic assignment
Myotis capaccinii
Miniopterus schreibersii
Operational Taxonomic Unit
Rhinolophus euryale
PCR replicates
Myotis daubentonii
present
Faecal samples
Molecular diet analyses
biodiversity assessment
Chiroptera
Rhinolophus hipposideros
2018-06-29T00:00:00Z
2018-06-29T00:00:00Z
en
https://doi.org/10.1111/2041-210x.12849
11524323409 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
1. Metabarcoding of environmental samples has many challenges and
limitations that require carefully considered laboratory and analysis
pipelines to ensure reliable results. We explore how decisions regarding
study design, laboratory work and bioinformatic processing affect the
final results, and provide guidelines for reliable study of environmental
samples. 2. We evaluate the performance of four primer sets targeting COI
and 16S regions characterising arthropod diversity in bat faecal samples,
and investigate how metabarcoding results are affected by parameters
including: i) number of PCR replicates per sample, ii) sequencing depth,
iii) PCR replicate processing strategy (i.e. either additively, by
combining the sequences obtained from the PCR replicates, or
restrictively, by only retaining sequences that occur in multiple PCR
replicates for each sample), iv) minimum copy number for sequences to be
retained, v) chimera removal, and vi) similarity thresholds for OTU
clustering. Lastly, we measure within- and between-taxa dissimilarities
when using sequences from public databases to determine the most
appropriate thresholds for OTU clustering and taxonomy assignment. 3. Our
results show that the use of multiple primer sets reduces taxonomic biases
and increases taxonomic coverage. Taxonomic profiles resulting from each
primer set are principally affected by how many PCR replicates are carried
out per sample and how sequences are filtered across them, the sequence
copy number threshold and the OTU clustering threshold. We also report
considerable diversity differences between PCR replicates from each
sample. Sequencing depth increases the dissimilarity between PCR
replicates unless the bioinformatic strategies to remove allegedly
artefactual sequences are adjusted according to the number of analysed
sequences. Finally, we show that the appropriate identity thresholds for
OTU clustering and taxonomy assignment differ between target markers. 4.
Metabarcoding of complex environmental samples ideally requires i)
investigation of whether more than one primer sets targeting the same
taxonomic group is needed to offset the effect of primer biases, ii) more
than one PCR replicate per sample, iii) bioinformatic processing
approaches of sequences that balance diversity detection with removal of
artificial sequences, and iv) empirical selection of OTU clustering and
taxonomy assignment thresholds tailored to each genetic marker and the
obtained taxa.
Epp PCR replicate 1 - PE2Compressed raw sequence output file. Contains the
following data: Epp primer set, 1st PCR replicate, reverse
read.epp_PCR1_PE2.fastq.gzEpp PCR replicate 3 - PE2Compressed raw sequence
output file. Contains the following data: Epp primer set, 3rd PCR
replicate, reverse read.epp_PCR3_PE2.fastq.gzEpp PCR replicate 3 -
PE1Compressed raw sequence output file. Contains the following data: Epp
primer set, 3rd PCR replicate, forward read.epp_PCR3_PE1.fastq.gzEpp PCR
replicate 1 - PE1Compressed raw sequence output file. Contains the
following data: Epp primer set, 1st PCR replicate, forward
read.epp_PCR1_PE1.fastq.gzEpp PCR replicate 2 - PE1Compressed raw sequence
output file. Contains the following data: Epp primer set, 2nd PCR
replicate, forward read.epp_PCR2_PE1.fastq.gzEpp PCR replicate 2 -
PE2Compressed raw sequence output file. Contains the following data: Epp
primer set, 2nd PCR replicate, reverse read.epp_PCR2_PE2.fastq.gzClarke
PCR replicate 1 - PE2Compressed raw sequence output file. Contains the
following data: Clarke primer set, 1st PCR replicate, reverse
read.clarke_PCR1_PE2.fastq.gzClarke PCR replicate 1 - PE1Compressed raw
sequence output file. Contains the following data: Clarke primer set, 1st
PCR replicate, forward read.clarke_PCR1_PE1.fastq.gzClarke PCR replicate 2
- PE 1Compressed raw sequence output file. Contains the following data:
Clarke primer set, 2nd PCR replicate, forward
read.clarke_PCR2_PE1.fastq.gzClarke PCR replicate 2 - PE2Compressed raw
sequence output file. Contains the following data: Clarke primer set, 2nd
PCR replicate, reverse read.clarke_PCR2_PE2.fastq.gzClarke PCR replicate 3
- PE2Compressed raw sequence output file. Contains the following data:
Clarke primer set, 3rd PCR replicate, reverse
read.clarke_PCR3_PE2.fastq.gzClarke PCR replicate 3 - PE1Compressed raw
sequence output file. Contains the following data: Clarke primer set, 3rd
PCR replicate, forward read.clarke_PCR3_PE1.fastq.gzLeray PCR replicate 1
- PE1Compressed raw sequence output file. Contains the following data:
Leray primer set, 1st PCR replicate, forward
read.leray_PCR1_PE1.fastq.gzLeray PCR replicate 1 - PE2Compressed raw
sequence output file. Contains the following data: Leray primer set, 1st
PCR replicate, reverse read.leray_PCR1_PE2.fastq.gzLeray PCR replicate 2 -
PE1Compressed raw sequence output file. Contains the following data: Leray
primer set, 2nd PCR replicate, forward read.leray_PCR2_PE1.fastq.gzLeray
PCR replicate 2 - PE2Compressed raw sequence output file. Contains the
following data: Leray primer set, 2nd PCR replicate, reverse
read.leray_PCR2_PE2.fastq.gzLeray PCR replicate 3 - PE1Compressed raw
sequence output file. Contains the following data: Leray primer set, 3rd
PCR replicate, forward read.leray_PCR3_PE1.fastq.gzLeray PCR replicate 3 -
PE2Compressed raw sequence output file. Contains the following data: Leray
primer set, 3rd PCR replicate, reverse read.leray_PCR3_PE2.fastq.gzZeale
PCR replicate 3 - PE2Compressed raw sequence output file. Contains the
following data: Zeale primer set, 3rd PCR replicate, reverse
read.zeale_PCR3_PE2.fastq.gzZeale PCR replicate 3 - PE1Compressed raw
sequence output file. Contains the following data: Zeale primer set, 3rd
PCR replicate, forward read.zeale_PCR3_PE1.fastq.gzZeale PCR replicate 2 -
PE2Compressed raw sequence output file. Contains the following data: Zeale
primer set, 2nd PCR replicate, reverse read.zeale_PCR2_PE2.fastq.gzZeale
PCR replicate 2 - PE1Compressed raw sequence output file. Contains the
following data: Zeale primer set, 2nd PCR replicate, forward
read.zeale_PCR2_PE1.fastq.gzZeale PCR replicate 1 - PE2Compressed raw
sequence output file. Contains the following data: Zeale primer set, 1st
PCR replicate, reverse read.zeale_PCR1_PE2.fastq.gzZeale PCR replicate 1 -
PE1Compressed raw sequence output file. Contains the following data: Zeale
primer set, 1st PCR replicate, forward read.zeale_PCR1_PE1.fastq.gz
Europe