10.5061/DRYAD.XGXD254J2
Frøslev, Tobias
0000-0002-3530-013X
University of Copenhagen
Barnes, Christopher
University of Copenhagen
Data from: Metabarcoding of soil environmental DNA replicates plant
community variation but not specificity
Dryad
dataset
2022
FOS: Biological sciences
Environmental metabarcoding
field survey data
Biodiversity
biodiversity assessment
The Velux Foundations
https://ror.org/007ww2d15
VKR-023343
2022-02-25T00:00:00Z
2022-02-25T00:00:00Z
en
https://www.nature.com/articles/s41467-017-01312-x
https://doi.org/10.5061/dryad.n9077
https://doi.org/10.1186/s12898-019-0260-x
2181937229 bytes
3
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
While metabarcoding of plant DNA from their environment is an exciting
method that can supplement inventorying of live plant species, the
accuracy and specificity has yet to be fully assessed over complex
continuous landscapes. In this work, we evaluate plant community profiles
produced via metabarcoding of soil by comparing them to a morphological
survey. We assessed plant communities by metabarcoding of soil DNA in 130
sites along ecological gradients (nutrients, succession, moisture) in
Denmark using chloroplast trnL region (10-143 bp) primer set and compared
the resulting communities to communities produced with a longer nuclear
ITS2 region (~216 bp) and a morphological survey. We found that the
community variation observed within the morphological survey was well
represented by molecular surveys, with significant correlation with both
community composition and richness using both primer sets. While the
majority of the ITS2 sequences could be assigned to species (over 80%), we
had less success with the trnL sequences (70%), which was only possible
after restricting the reference database to local species. We conclude
that the community profiles produced by metabarcoding can be highly
effective in performing large-scale macroecological studies. However, the
discovery rates and taxonomic assignments produced via metabarcoding
remained inferior to morphological surveys, but manual curation of
databases improves the specificity of assignments made by the trnL
primers, and improves the accuracy of the assignments made with the ITS2
primers. Finally, we suggest that a greater percentage of named diversity
would be recovered by increasing soil sampling with the use of additional
universal primer sets.
Sampling was performed across 130 sites (40 m x 40 m) in Denmark. For this
study we generated new sequence data for trnL from existing DNA extracts,
and used already published sequence data for ITS2 (from the same DNA
extracts) and combined with published survey data for plants from the same
study sites. Detailed materials and methods can be read in detail in the
associated publication and in Frøslev et al (2017) and Brunbjerg et al
(2019). This repository holds the following material: TRNL SEQUENCE DATA:
A) trnl_fastq.tar.gz – Sequence data. Raw tRNL sequencing data from MiSeq.
6 sequencing libraries (R1 + R2), with multiplexed primers, approximately
67-71 samples (PCR products) per library. B) trnl_taglists.zip – PCR
tagging. One file per library with tag pairs used. Each tag is a 6 bp
oligo preceding the primer. C) trnl_replicate_Info.csv – PCR replicates.
One file with PCR numbers (S001 and up) and corresponding sample numbers
(like SN081). Each of the 130 samples were amplified in three PCR
replicates ITS2 SEQUENCE DATA: D) Raw its2 sequence data can be downloaded
here: https://doi.org/10.5061/dryad.n9077 E) trnl_taglists.zip – PCR
tagging. One file per library with tag pairs used. Each tag is a 6 bp
oligo preceding the primer. F) Its2_replicate_Info.csv - PCR replicates.
One file with PCR numbers (S001 and up) and corresponding sample numbers
(like SN081). Each of the 130 samples were amplified in three PCR
replicates Processed data: G) community_tables.zip - Taxonomically
annotated tables with used for the analyses. 7 tables in rds format. (read
in r with function readRDS). For tRNL and ITS2 there are tables annotated
with local, regional and global reference databases, respectively. Each
table contains: read counts for each OTU in each of the 130 sites; OTU_ID
= a sha1 hash of the sequence; sequence = DNA sequence of the OTU; seq_len
= length of the sequence; pident = match with reference sequence; and
taxonomic affiliation at 6 levels, and a field indicating if the
annotation is taxonomically redundant. And a table with the inventory
(survey) data. References: Frøslev, T.G., R. Kjøller, H.H. Bruun, R.
Ejrnæs, A.K. Brunbjerg, C. Pietroni and A.J. Hansen. 2017. Algorithm for
post-clustering curation of DNA amplicon data yields reliable biodiversity
estimates. Nature communications 8(1): 1–11. Brunbjerg, A. K., H.H. Bruun,
K. Brøndum, A.T. Classen, L. Dalby, K. Fog, T.G. Frøslev, I. Goldberg, et
al. 2019. A systematic survey of regional multi-taxon biodiversity:
evaluating strategies and coverage. BMC ecology, 19(1): 1–15.