10.5061/DRYAD.DK22G4H
Gage, Joseph L.
University of Wisconsin-Madison
Vaillancourt, Brieanne
Michigan State University
Hamilton, John P.
Michigan State University
Manrique-Carpintero, Norma C.
Michigan State University
Gustafson, Timothy J.
Monsanto Company; 7202 Portage Road DeForest WI 53532
Barry, Kerrie
Joint Genome Institute
Lipzen, Anna
Joint Genome Institute
Tracy, William F.
University of Wisconsin-Madison
Mikel, Mark A.
University of Illinois at Urbana Champaign
Kaeppler, Shawn M.
University of Wisconsin-Madison
Buell, C. Robin
Michigan State University
de Leon, Natalia
University of Wisconsin-Madison
Data from: Multiple maize reference genomes impact the identification of
variants by GWAS in a diverse inbred panel
Dryad
dataset
2019
2019-02-13T17:40:37Z
2019-02-13T17:40:37Z
en
https://doi.org/10.3835/plantgenome2018.09.0069
7401187193 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Use of a single reference genome for genome-wide association studies
(GWAS) limits the gene space represented to that of a single accession.
This limitation can complicate identification and characterization of
genes located within presence/absence variations (PAVs). In this study, we
present the draft de novo genome assembly of PHJ89, an Oh43-type inbred
line. Using three separate reference genome assemblies (B73, PH207, and
PHJ89) that represent the predominant germplasm groups of maize, we
generated three separate whole-seedling gene expression profile and single
nucleotide polymorphism (SNP) matrices from a panel of 942 diverse inbred
lines. We identified 34,447 (B73), 39,672 (PH207), and 37,436 (PHJ89)
transcripts that are not present in the respective reference genome
assembly. GWAS was conducted in the 942 inbred panel using both the SNP
and expression data values to map sugarcane mosaic virus (SCMV)
resistance. Highlighting the impact of alternative reference genomes in
gene discovery, GWAS results for SCMV resistance using expression values
as a surrogate measure of PAV resulted in robust detection of the physical
location of a known resistance gene when using the B73 reference that
contains the gene, but not when using the PH207 reference. This study
provides the valuable resource of the Oh43-type PHJ89 genome assembly as
well as SNP and expression data for 942 individuals generated using three
different reference genomes.
data_dryad_readme_18Jan19.txtReadme.942_FPKM_B73_genes_w_feature.txtExpression abundances of B73 v4 annotated genes (protein coding, tRNA, miRNA, and lincRNA genes).942_FPKM_B73_RTAs.txtExpression abundances of B73-derived novel transcripts.942_FPKM_PH207_genes.txtExpression abundances of PH207 annotated genes.942_FPKM_PH207_RTAs.txtExpression abundances of PH207-derived novel transcripts.942_FPKM_PHJ89_genes.txtExpression abundances of PHJ89 annotated genes.942_FPKM_PHJ89_RTAs.txtExpression abundances of PHJ89-derived novel transcripts.942_FPKM_LOCONF_B73_genes_w_feature.txtExpression abundances of B73 v4 annotated genes (protein coding, tRNA, miRNA, and lincRNA genes).942_FPKM_LOCONF_B73_RTAs.txtExpression abundances of B73-derived novel transcripts.942_FPKM_LOCONF_PH207_genes.txtExpression abundances of PH207 annotated genes.942_FPKM_LOCONF_PH207_RTAs.txtExpression abundances of PH207-derived novel transcripts.942_FPKM_LOCONF_PHJ89_genes.txtExpression abundances of PHJ89 annotated genes.942_FPKM_LOCONF_PHJ89_RTAs.txtExpression abundances of PHJ89-derived novel transcripts.B73_plus_RTAs_snp_matrix_995785.txt.gzSNP calls B73 plus B73-derived novel transcripts.PH207_plus_RTAs_snp_matrix_988252.txt.gzSNP calls PH207 plus PH207-derived novel transcripts.PHJ89_plus_RTAs_snp_matrix_995238.txt.gzSNP calls PHJ89 plus PHJ89-derived novel transcripts.Trinity_B73_unmapped_transcriptome_assembly.fastaB73-derived novel transcripts.Trinity_PH207_unmapped_transcriptome_assembly.fastaPH207-derived novel transcripts.Trinity_PHJ89_unmapped_transcriptome_assembly.fastaPHJ89-derived novel transcripts.B73_plus_RTAs_snp_matrix_imputed.zipwidiv_942g_979873SNPs_imputed_filteredGenos_withRTA_AGPv4.hmp.txt - Imputed SNP calls B73 plus B73-derived novel transcripts. SNPs on contigs that are not part of the 10 chromosomes are coded as being on chromosome 11, and SNPs on RTAs are coded as being on chromosome 12. widiv_942g_column_name_converter.txt - Converts genotype names between unimputed (column 'Original') and imputed (column 'NoSpaces').PH207_plus_RTAs_snp_matrix_imputed.zipwidiv_942g_971213SNPs_imputed_filteredGenos_withRTA_PH207ref.hmp.txt - Imputed SNP calls PH207 plus PH207-derived novel transcripts. SNPs on RTAs are coded as being on chromosome 11, and SNPs on contigs that are not part of the 10 chromosomes are coded as being on chromosome 12. widiv_942g_column_name_converter.txt - Converts genotype names between unimputed (column 'Original') and imputed (column 'NoSpaces').phj89_final_asm.no_desc.min_1k.faGenome assembly of Z. mays PHJ89 - min1kb scaffolds.phj89_final_asm.no_desc.faGenome assembly of Z. mays PHJ89 - all scaffolds.BUSCO_result.zipFolder contains the output files from running BUSCO on the PHJ89 genome assembly.phj89_gene_models.hc.gff3Generic feature format file (gff3) of high-confidence genes in PHJ89.phj89_gene_models.hc.cdna.faTranscript sequences (cDNA) of high-confidence genes in PHJ89.phj89_gene_models.hc.cds.faCoding sequences (CDS) of high-confidence genes in PHJ89.phj89_gene_models.hc.pep.faProtein sequences of high-confidence genes in PHJ89.phj89_gene_models.hc.func_anno.txtFunctional annotation of high-confidence genes in PHJ89.phj89_gene_models.lc.gff3Generic feature format file (gff3) of low-confidence genes in PHJ89.phj89_gene_models.lc.cdna.faTranscript sequences (cDNA) of low-confidence genes in PHJ89.phj89_gene_models.lc.cds.faCoding sequences (CDS) of low-confidence genes in PHJ89.phj89_gene_models.lc.pep.faProtein sequences of low-confidence genes in PHJ89.phj89_gene_models.lc.func_anno.txtFunctional annotation of low-confidence genes in PHJ89.