10.5061/DRYAD.8091Q
Shortt, Jonathan A.
Department of Biochemistry & Molecular Genetics, University of
Colorado School of Medicine, Aurora, CO, United States of America
Card, Daren C.
The University of Texas at Arlington
Schield, Drew R.
The University of Texas at Arlington
Liu, Yang
Institute of Parasitic Disease, Sichuan Center for Disease Control and
Prevention, Chengdu, The People’s Republic of China
Zhong, Bo
Institute of Parasitic Disease, Sichuan Center for Disease Control and
Prevention, Chengdu, The People’s Republic of China
Castoe, Todd A.
The University of Texas at Arlington
Carlton, Elizabeth J.
University of Colorado Boulder
Pollock, David D.
Department of Biochemistry & Molecular Genetics, University of
Colorado School of Medicine, Aurora, CO, United States of America
Data from: Whole genome amplification and reduced-representation genome
sequencing of Schistosoma japonicum miracidia
Dryad
dataset
2018
2018-01-10T00:00:00Z
2018-01-10T00:00:00Z
en
https://doi.org/10.1371/journal.pntd.0005292
736364632 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Background: In areas where schistosomiasis control programs have been
implemented, morbidity and prevalence have been greatly reduced. However,
to sustain these reductions and move towards interruption of transmission,
new tools for disease surveillance are needed. Genomic methods have the
potential to help trace the sources of new infections, and allow us to
monitor drug resistance. Large-scale genotyping efforts for schistosome
species have been hindered by cost, limited numbers of established target
loci, and the small amount of DNA obtained from miracidia, the life stage
most readily acquired from humans. Here, we present a method using next
generation sequencing to provide high-resolution genomic data from S.
japonicum for population-based studies. Methodology/Principal Findings: We
applied whole genome amplification followed by double digest restriction
site associated DNA sequencing (ddRADseq) to individual S. japonicum
miracidia preserved on Whatman FTA cards. We found that we could
effectively and consistently survey hundreds of thousands of variants from
10,000 to 30,000 loci from archived miracidia as old as six years. An
analysis of variation from eight miracidia obtained from three hosts in
two villages in Sichuan showed clear population structuring by village and
host even within this limited sample. Conclusions/Significance: This
high-resolution sequencing approach yields three orders of magnitude more
information than microsatellite genotyping methods that have been employed
over the last decade, creating the potential to answer detailed questions
about the sources of human infections and to monitor drug resistance.
Costs per sample range from $50-$200, depending on the amount of sequence
information desired, and we expect these costs can be reduced further
given continued reductions in sequencing costs, improvement of protocols,
and parallelization. This approach provides new promise for using modern
genome-scale sampling to S. japonicum surveillance, and could be applied
to other schistosome species and other parasitic helminthes
adult_worm_and_eight_miracidia_stringent_filtervcf comprised of
stringently filtered variants found from ddRADseq in S. japonicum adult
worm and eight miracidia
samples.all_samples.vcfmircidia_only_stringent_filtervcf comprised of
stringently filtered variants found from ddRADseq in 8 S. japonicum
miracidia samples.all_miracidia.vcfmiracidia_missing25_stringent_filtervcf
comprised of variants found in ddRADseq in at least 75% of eight S.
japonicum miracidia.miracidia_missing25.vcfmiracidia_only_loose_filtervcf
comprised of loosely filtered variants from ddRADseq of eight S japonicum
miracidiamiracidia_all_samples_filtered_variable.vcfhaplotype_caller_unfilteredunfiltered vcf from S. japonucum adult worm and 8 miracidia generated using GATK haplotype calleroutput.all.test1.gvcfunified_genotyper_all_samples_unfilteredunfiltered vcf from S. japonucum adult worm and 8 miracidia generated using GATK unified genotyper.schisto_align.merged.bam.realigned.bam.filtered-variants.vcfmulti_cov_by_len# Used to determine coverage of features in a bed file using the alignments in a bam file. # Output lists the length of features, number of features of that length, and percentage of features of that length recovered in the bam file. # Script reads in output from bedtools coverage command executed as follows: bedtools coverage -hist -abam [bam_infile] -b [bed_file] > [output]proportion_het_v2Returns the proportion of heterozygous variants for each sample in a .vcfvar_sharing_samplingcalculates pairwise similarity of genotypes between samples of a vcfcheck_radtag_from_samcounts the number of times a provided restriction site sequence is found at the beginning of a sequence read from a .sam filecutgenome.v0.04Returns .bed file with double digested fragments from a provided genome and cut sitesS4Table_microsatellite_primersS4Table.xlsxvcffilter.pyFilters vcf. Removes monomorphic or non-bi-allelic variants, and codes individual genotypes as missing data if the genotype quality score is below 20 or individual read depth is below 10.vcffilter