Data from: Metabarcoding of soil environmental DNA replicates plant community variation but not specificity

10.5061/DRYAD.XGXD254J2 Frøslev, Tobias 0000-0002-3530-013X University of Copenhagen Barnes, Christopher University of Copenhagen Data from: Metabarcoding of soil environmental DNA replicates plant community variation but not specificity Dryad dataset 2022 FOS: Biological sciences Environmental metabarcoding field survey data Biodiversity biodiversity assessment The Velux Foundations https://ror.org/007ww2d15 VKR-023343 2022-02-25T00:00:00Z 2022-02-25T00:00:00Z en https://www.nature.com/articles/s41467-017-01312-x https://doi.org/10.5061/dryad.n9077 https://doi.org/10.1186/s12898-019-0260-x 2182018011 bytes 3 Creative Commons Zero v1.0 Universal While metabarcoding of plant DNA from their environment is an exciting method that can supplement inventorying of live plant species, the accuracy and specificity has yet to be fully assessed over complex continuous landscapes. In this work, we evaluate plant community profiles produced via metabarcoding of soil by comparing them to a morphological survey. We assessed plant communities by metabarcoding of soil DNA in 130 sites along ecological gradients (nutrients, succession, moisture) in Denmark using chloroplast trnL region (10-143 bp) primer set and compared the resulting communities to communities produced with a longer nuclear ITS2 region (~216 bp) and a morphological survey. We found that the community variation observed within the morphological survey was well represented by molecular surveys, with significant correlation with both community composition and richness using both primer sets. While the majority of the ITS2 sequences could be assigned to species (over 80%), we had less success with the trnL sequences (70%), which was only possible after restricting the reference database to local species. We conclude that the community profiles produced by metabarcoding can be highly effective in performing large-scale macroecological studies. However, the discovery rates and taxonomic assignments produced via metabarcoding remained inferior to morphological surveys, but manual curation of databases improves the specificity of assignments made by the trnL primers, and improves the accuracy of the assignments made with the ITS2 primers. Finally, we suggest that a greater percentage of named diversity would be recovered by increasing soil sampling with the use of additional universal primer sets. Sampling was performed across 130 sites (40 m x 40 m) in Denmark. For this study we generated new sequence data for trnL from existing DNA extracts, and used already published sequence data for ITS2 (from the same DNA extracts) and combined with published survey data for plants from the same study sites. Detailed materials and methods can be read in detail in the associated publication and in Frøslev et al (2017) and Brunbjerg et al (2019). This repository holds the following material: TRNL SEQUENCE DATA: A) trnl_fastq.tar.gz – Sequence data. Raw tRNL sequencing data from MiSeq. 6 sequencing libraries (R1 + R2), with multiplexed primers, approximately 67-71 samples (PCR products) per library. B) trnl_taglists.zip – PCR tagging. One file per library with tag pairs used. Each tag is a 6 bp oligo preceding the primer. C) trnl_replicate_Info.csv – PCR replicates. One file with PCR numbers (S001 and up) and corresponding sample numbers (like SN081). Each of the 130 samples were amplified in three PCR replicates ITS2 SEQUENCE DATA: D) Raw its2 sequence data can be downloaded here: https://doi.org/10.5061/dryad.n9077 E) trnl_taglists.zip – PCR tagging. One file per library with tag pairs used. Each tag is a 6 bp oligo preceding the primer. F) Its2_replicate_Info.csv - PCR replicates. One file with PCR numbers (S001 and up) and corresponding sample numbers (like SN081). Each of the 130 samples were amplified in three PCR replicates Processed data: G) community_tables.zip - Taxonomically annotated tables with used for the analyses. 7 tables in rds format. (read in r with function readRDS). For tRNL and ITS2 there are tables annotated with local, regional and global reference databases, respectively. Each table contains: read counts for each OTU in each of the 130 sites; OTU_ID = a sha1 hash of the sequence; sequence = DNA sequence of the OTU; seq_len = length of the sequence; pident = match with reference sequence; and taxonomic affiliation at 6 levels, and a field indicating if the annotation is taxonomically redundant. And a table with the inventory (survey) data. References: Frøslev, T.G., R. Kjøller, H.H. Bruun, R. Ejrnæs, A.K. Brunbjerg, C. Pietroni and A.J. Hansen. 2017. Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature communications 8(1): 1–11. Brunbjerg, A. K., H.H. Bruun, K. Brøndum, A.T. Classen, L. Dalby, K. Fog, T.G. Frøslev, I. Goldberg, et al. 2019. A systematic survey of regional multi-taxon biodiversity: evaluating strategies and coverage. BMC ecology, 19(1): 1–15.