10.5061/DRYAD.6WWPZGMWG
Labate, Joanne
0000-0002-7863-4228
United States Department of Agriculture
Glaubitz, Jeffrey
Cornell University
Havey, Michael
0000-0003-4443-9376
United States Department of Agriculture
Onion (Allium cepa) pseudoreference genome
Dryad
dataset
2020
Agricultural Research Service
https://ror.org/02d2m2044
CRIS Project No. 8060-21000-027-00-D
National Institute of Food and Agriculture
https://ror.org/05qx3fv49
SCRI grant 2008-51180-04875
2020-08-13T00:00:00Z
2020-08-13T00:00:00Z
en
7229909 bytes
2
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Onion (Allium cepa) is not highly tractable for development of molecular
markers due to its large (16 gigbases per 1C) nuclear genome. Single
nucleotide polymorphisms (SNPs) are useful for genetic characterization
and marker-aided selection of onion because of codominance and common
occurrence in elite germplasm. We completed genotyping by sequencing
(GBS) to identify SNPs in onion using 46 F2 plants, parents of the F2
plants (Ailsa Craig 43 and Brigham Yellow Globe 15-23), two doubled
haploid (DH) lines (DH2107 and DH2110), and plants from 94 accessions in
the USDA National Plant Germplasm System (NPGS). SNPs were called using
the TASSEL 3.0 Universal Network Enabled Analysis (UNEAK) bioinformatics
pipeline. Sequences from the F2 and DH plants were used to construct a
pseudo-reference genome against which genotypes from all accessions were
scored. Quality filters were used to identify a set of 284 high quality
SNPs which were placed onto an existing genetic map for the F2 family.
Accessions showed a moderate level of diversity (mean He = 0.341) and
evidence of inbreeding (mean F = 0.592). GBS is promising for SNP
discovery in onion, although lack of a reference genome required extensive
custom scripts for bioinformatics analyses to identify high quality
markers.
46 F2 plants and parents of the onion (Allium cepa) mapping population
Brigham Yellow Globe 15-23 x Ailsa Craig 43 were genotyped, as well as two
doubled haploid (DH) onion lines DH2107 and DH2110 which were used as
completely homozygous controls. Genotyping by sequencing (GBS) was
performed using an Illumina HiSeq 2000 on two to four replicates of every
DNA sample. GBS libraries were prepared at Cornell University’s Genomic
Diversity Facility using the restriction enzyme EcoT22I and assayed in
96-plex format using standard protocols. SNP calling on the 46 F2
plants, two parents, and the two DH lines was performed using TASSEL 3.0
Universal Network Enabled Analysis (UNEAK) bioinformatics pipeline, which
does not require a reference genome. Over 70,000 raw SNPs were scored in
these samples. Quality filters were then applied to SNPs as follows: not
heterozygous in either DH line, minor allele frequency greater than or
equal to 30%, minimum genotypic read depth of seven, maximum missing data
of 10%, and conforming to the expected 1:2:1 segregation ratio
(goodness-of-fit > 0.01) within the F2 family. For the resulting
752 SNPs, MSTMap software tool was used to construct a genetic linkage map
using a grouping LOD criteria of p < 1 x 10-7. This gave 701 SNPs
in 15 linkage groups (LG) with ≥ 15 markers each (the remaining 51 markers
were not placed on a linkage group). The number of SNPs per LG ranged
from 15 – 90, and the estimated size of LGs ranged from 52 to 327 cM.
Because UNEAK treats redundant, reverse complement tags from opposite
strands as separate markers, 171 redundant tag pairs were eliminated from
this linkage map. A pseudo-reference genome was constructed consisting of
one tag from each of the 530 non-redundant, mapped tag pairs concatenated
together into a single pseudo-molecule. To prevent spurious alignment
across two distinct pseudo-reference tags, each tag in the
pseudo-reference was separated by a span of at least 32 A nucleotides. The
purpose of the pseudo-reference was to allow discovery of additional SNPs
within each tag pair locus in 94 diverse onion accessions that were not
segregating in the mapping population, thereby reducing the ascertainment
bias that would result from using only SNPs discovered in only one F2
family in a population survey.