10.5061/DRYAD.41B9G
Ma, Chuang
University of Arizona
Xin, Mingming
University of Arizona
Feldmann, Kenneth A.
University of Arizona
Wang, Xiangfeng
University of Arizona
Data from: Machine learning-based differential network analysis: a study
of stress-responsive transcriptomes in Arabidopsis thaliana
Dryad
dataset
2015
Systems biology
Arabidopsis thaliana
abiotic/environmental stress
differential network
Bioinformatics
Transcriptome analysis
gene coexpression network
2015-01-07T00:00:00Z
2015-01-07T00:00:00Z
en
https://doi.org/10.1105/tpc.113.121913
27590656 bytes
1
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Machine learning (ML) is an intelligent data mining technique that builds
a prediction model based on the learning of prior knowledge to recognize
patterns in large-scale data sets. We present an ML-based methodology for
transcriptome analysis via comparison of gene coexpression networks,
implemented as an R package called machine learning–based differential
network analysis (mlDNA) and apply this method to reanalyze a set of
abiotic stress expression data in Arabidopsis thaliana. The mlDNA first
used a ML-based filtering process to remove nonexpressed, constitutively
expressed, or non-stress-responsive “noninformative” genes prior to
network construction, through learning the patterns of 32 expression
characteristics of known stress-related genes. The retained “informative”
genes were subsequently analyzed by ML-based network comparison to predict
candidate stress-related genes showing expression and network differences
between control and stress networks, based on 33 network topological
characteristics. Comparative evaluation of the network-centric and
gene-centric analytic methods showed that mlDNA substantially outperformed
traditional statistical testing–based differential expression analysis at
identifying stress-related genes, with markedly improved prediction
accuracy. To experimentally validate the mlDNA predictions, we selected 89
candidates out of the 1784 predicted salt stress–related genes with
available SALK T-DNA mutagenesis lines for phenotypic screening and
identified two previously unreported genes, mutants of which showed
salt-sensitive phenotypes.
Supplemental Dataset 1Known Stress-related Genes Collected from the TAIR
and DRASTIC Databases, their Expression Changes in the Stress Microarray
Datasets, and the Statistical Results of their Gene Ontology (GO)
AnnotationsSupplemental Dataset 2“Informative” Genes Obtained from
PSOL-based ML Analysis for Gene Co-expression Network Construction under
Six Studied Stresses in Two TissuesSupplemental Dataset 3Candidate
Stress-related Genes Predicted by mlDNASupplemental Dataset 4List of the
Candidate Stress-related Genes Evidenced by a High-throughput Phenotypic
Screening ExperimentSupplemental Dataset 5Detailed Information for Gene
Ontology (GO) Modules Enriched with Salt Stress-related GenesSupplemental
Dataset 6List of Stress Shared GenesSupplemental Dataset 7List of
Stress-Specific Genes
USA
Tucson
Arizona