10.5061/DRYAD.HHMGQNKGG
Lauer, Eddie
0000-0001-8846-6232
North Carolina State University
Variant discovery in full-sibling families of Pinus taeda L
Dryad
dataset
2021
FOS: Agriculture, forestry, and fisheries
Freebayes
Variant Call Format
variant calling
National Institute of Food and Agriculture
https://ror.org/05qx3fv49
2019-67013-29169
2021-04-10T00:00:00Z
2021-04-10T00:00:00Z
en
642865125 bytes
2
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Fusiform rust disease, caused by the endemic fungus Cronartium quercuum f.
sp. fusiforme, is the most damaging disease affecting economically
important pine species in the southeast United States. In this report, we
detail the genomic localization and sequence-level discovery of candidate
race-nonspecific broad-spectrum fusiform rust resistance genes in Pinus
taeda L. Two full-sib families, each with ~1000 progeny, were challenged
with a complex inoculum consisting of over 150 pathogen isolates.
High-density linkage mapping revealed three QTL distributed on two linkage
groups. The two QTL on linkage group 2 were additive with respect to their
effects on the probability of disease outcome. All three QTL were
validated using a population of 2057 cloned pine genotypes in a
six-year-old multi-environmental field trial. As a complement to the QTL
mapping approach, bulked segregant RNAseq analysis revealed a small
number of candidate nucleotide binding leucine rich repeat genes harboring
SNP significantly associated with disease resistance. The results of this
study demonstrate that single qualitative resistance genes can confer
effective resistance against genetically diverse mixtures of an endemic
pathogen.
A total of 15 samples had adequate quantity and quality of RNA for library
preparation. Each sample was sequenced on two lanes of an S2 flow cell of
the NovaSeq6000 Illumina sequencer, resulting in 7.6x109 50bp paired-end
reads. For each sample, reads originating from lanes 1 and 2 were combined
into a single fastq file for each mate, and aligned to the PacBio
reference transcriptome using bwa mem with the default options (Heng Li,
2013). Around 70% of the sequences from each sample had both mate pairs
properly mapped to the transcriptome, with an average quality score of
35.8, an average insert size of ~275bp, and an average depth of 163x.
Following alignment, variants were called using Freebayes version 0.9.6
(Garrison & Marth, 2012). Each bam file from a single family was
combined in a variant discovery run. Since each sample represented a bulk
of 100 (for the random) or 50 (for the disease status) individuals, the
population model was specified as ‘pooled’ using the ‘-J’ qualifier.
Complex alleles of up to 25bp were allowed using the ‘—max-complex-gap’
qualifier. Biallelic SNP with a minimum of 10 observations of the
alternate allele were considered for downstream analysis.
Sample ID's appearing in the .vcf files are described below. Family
E4 S15 random bulk collected prior to inoculation (100 full-sib) S11
non-diseased bulk collected 7 months post-inoculation (50 full-sib) S19
non-diseased bulk collected 10 months post-inoculation (100 full-sib) S2
non-diseased bulk collected 7 months post-inoculation (50 full-sib) S18
diseased bulk collected 10 months post-inoculation (100 full-sib) S1
diseased bulk collected 7 months post-inoculation (50 full-sib) S5
diseased bulk collected 7 months post-inoculation (50 full-sib) Family E9
S16 random bulk collected prior to inoculation (100 full-sib) S12
non-diseased bulk collected 7 months post-inoculation (50 full-sib) S17
non-diseased bulk collected 10 months post-inoculation (100 full-sib) S3
non-diseased bulk collected 7 months post-inoculation (50 full-sib) S7
non-diseased bulk collected 7 months post-inoculation (50 full-sib) S20
diseased bulk collected 10 months post-inoculation (100 full-sib) S4
diseased bulk collected 7 months post-inoculation (50 full-sib) S8
diseased bulk collected 7 months post-inoculation (50 full-sib)