10.5061/DRYAD.44J0ZPCF0
Wang, Jinyu
Iowa State University
Guo, Guo
Iowa State University
Guo, Tingting
Iowa State University
Dzievit, Matthew
Iowa State University
Xiaoqing, Xiaoqing
Iowa State University
Liu, Peng
Iowa State University
Price, Kevin
Iowa State University
Yu, Jianming
0000-0001-5326-3099
Iowa State University
Data and scripts for: Genetic dissection of seasonal vegetation index
dynamics in maize through aerial based high-throughput phenotyping
Dryad
dataset
2021
Agricultural biotechnology, GWAS, UAV, High throughput phenotyping, Growth
curve, Remote sensing, Maize, P-splines
National Institute of Food and Agriculture
https://ror.org/05qx3fv49
2017-67007-25942
2022-02-16T00:00:00Z
2022-02-16T00:00:00Z
en
https://doi.org/10.1002/tpg2.20155
https://doi.org/10.5281/zenodo.5389561
68815038 bytes
3
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Plant phenotyping under field conditions plays an important role in
agricultural research. Efficient and accurate high-throughput phenotyping
strategies enable a better connection between genotype and phenotype.
Unmanned aerial vehicle-based high-throughput phenotyping platforms
(UAV-HTPPs) provide novel opportunities for large-scale proximal
measurement of plant traits with high efficiency, high resolution, and low
cost. The objective of this study was to use time series normalized
difference vegetation index (NDVI) extracted from UAV-based multispectral
imagery to characterize its pattern across development and conduct genetic
dissection of NDVI in a large maize population. The time series NDVI data
from the multispectral sensor were obtained at 5 time points across the
growing season for 1,752 diverse maize accessions with a UAV-HTPP. Cluster
analysis of the acquired measurements classified 1,752 maize accessions
into 2 groups with distinct NDVI developmental trends. To capture the
dynamics underlying these static observations, penalized-splines
(P-splines) model was used to obtain genotype-specific curve parameters.
Genome-wide association study (GWAS) using static NDVI values and curve
parameters as phenotypic traits detected signals significantly associated
with the traits. Additionally, GWAS using the projected NDVI values from
the P-splines models revealed the dynamic change of genetic effects,
indicating the role of gene-environment interplay in controlling NDVI
across the growing season. Our results demonstrated the utility of
ultra-high spatial resolution multispectral imagery, as that acquired
using a UAV-based remote sensing, for genetic dissection of NDVI.
The UAV system contains a DJI S900 UAV and an NIR converted multispectral
Canon Rebel SL1 DSLR camera with an intervalometer and GPS. We conducted 5
UAV overflights across the growing season in 2017. Overflights were
scheduled around 5 growth stages (V4, V8, V12, VT, and R5). The following
image processing steps were applied to obtain high quality data from the
raw UAV images: image pre-processing, orthomosaic generation, VIs
calculation, and plot-level data extraction. The plot-level NDVI mean was
calculated from the reflectance measurements in the red and NIR portion of
the spectrum from the transect area of each plot. NDVI values generated on
the -1 to 1 scale were rescaled by adding 1 and then multiplying by 128 to
convert them into the [0 – 255] range. Time series NDVI values were
obtained from five overflights for 1,752 diverse maize accessions. GWAS
was conducted for the static NDVI values and the curve parameters derived
across stages.
## Dataset repository for the "Genetic Dissection of Seasonal
Vegetation Index Dynamics in Maize through Aerial Based High-throughput
Phenotyping" Project ### Outline of the repository In order to better
guide the visitors about the repository, here we briefly introduce the
outline of the repository ### NDVI Distribution Dataset #### 1. Dataset
for NDVI distribution analysis ##### A. Name of the data file:
AmesDP_NDVI_hand_measured_trait_with_group_infor.txt ##### B. File
Overview -Number of variables/columns: 12 -Number of rows: 1752 -Variable
List: Group: Population information for each accession New_Group:
Population information for each accession, different from
'Group' column is that the sweet/popcorn using the kernel-based
criteria (doesn’t matter what the genetic background is, only whether the
kernel type is sweet or pop) Order: Manually defined accession order
number Taxa: Accession name NDVI_37DAP_2017 - NDVI_115DAP_2017 (5
variables in total): NDVI measurements at 37, 44, 60, 73, and 115 DAP. DAP
stands for Days after planting FT_2017: flowering time measurement
(DAP) PH_2017: plant height measurement (cm) EH_2017: ear height
measurement (cm) ### NDVI Clustering And Population Structure Dataset
#### 1. Dataset for NDVI distribution analysis ##### A. Name of the data
file: AmesDP_NDVI_hand_measured_trait_with_group_infor.txt ##### B. File
Overview -Basically the same dataset as above, so skip the file overview
part #### 2. Dataset for K-means clustering result ##### A. Name of the
data file: AmesPanel_NDVI_data_with_clustering_infor_2cluster ##### B.
File Overview -Number of variables/columns: 6 -Number of rows: 1752
-Variable List: Taxa: Accession name NDVI_37DAP_2017 -
NDVI_115DAP_2017 (5 variables in total): NDVI measurements at 37, 44, 60,
73, and 115 DAP. DAP stands for Days after planting cluster: cluster
category information for each accession, obtained from K-means clustering
analysis #### 3. Dataset for tSNE analysis ##### A. Name of the data file:
AmesDP_genome_hmp_m1m20_s10 ##### B. File Overview -Number of
variables/columns: 1753 -Number of rows: 31674 -Variable List: rs#:
SNP ID the rest columns: accession name/taxa for each accession
-Dataset for SNP set used for tSNE analysis #### 4. Dataset for ploting
growth curve, clustering result and tSNE result ##### A. Name of the data
file: AmesDP_2017_NDVI_cluster_tSNE_PCA_synthetic ##### B. File Overview
-Number of variables/columns: 17 -Number of rows: 1752 -Variable List:
Group: Population information for each accession New_Group: Population
information for each accession, different from 'Group' column is
that the sweet/popcorn using the kernel-based criteria (doesn’t matter
what the genetic background is, only whether the kernel type is sweet or
pop) Order: Manually defined accession order number Taxa:
Accession name NDVI_37DAP_2017 - NDVI_115DAP_2017 (5 variables in
total): NDVI measurements at 37, 44, 60, 73, and 115 DAP. DAP stands for
Days after planting cluster: cluster category information for each
accession, obtained from K-means clustering analysis pheno_tsne_Y1:
value on first dimension of tSNE result with phenotype data
pheno_tsne_Y2: value on second dimension of tSNE result with phenotype
data geno_tsne_Y1: value on first dimension of tSNE result with
genotype data geno_tsne_Y2: value on second dimension of tSNE result
with genotype data PC1 - PC3 (3 variables): first, second, third
dimention data from PCA analysis ### P-Spline Modeling NDVI growth Dataset
#### 1. Dataset for P-spline modeling ##### A. Name of the data file:
AmesDP_NDVI_hand_measured_trait ##### B. File Overview -Number of
variables/columns: 9 -Number of rows: 1752 -Variable List: Taxa,
NDVI_37DAP_2017 - NDVI_115DAP_2017, FT_2017, PH_2017, EH_2017 are the same
as above dataset named
AmesDP_NDVI_hand_measured_trait_with_group_infor.txt #### 2. Dataset for
P-spline modeling ##### A. Name of the data file: AmesDP2017_NDVI_long
-Basically an intermediate file, which is reframed from the file named
'AmesDP_NDVI_hand_measured_trait' (in wide format) to long
format ##### B. File Overview -Number of variables/columns: 4 -Number of
rows: 1752 -Variable List: Gen: Accession name Genor: Manually
assigned accession order Date: Days After Planting NDVI: NDVI
measurement #### 3. Dataset for Psplines modeling parameter ##### A. Name
of the data file: AmesDP_NDVI_Psplines_modeling_parameter.csv ##### B.
File Overview -Number of variables/columns: 4 -Number of rows: 1752
-Variable List: Geno: Accession name asymptote: model estimated
maximum NDVI value max_rate: model estimated maximum growth rate of
NDVI inflection_point: point in time with maximum growth rate of NDVI
#### 4. Dataset for Psplines predicted NDVI value ##### A. Name of the
data file: AmesDP_NDVI_Pspline_NDVI_prediction_by_1day.csv ##### B. File
Overview -Number of variables/columns: 3 -Number of rows: 1752 -Variable
List: geno: Accession name biomassspline: predicted biomass from
37-115 DAP with 1 day interval, the predicted biomass value is connected
with '_' growthratespline: predicted growth rate from 37-115
DAP with 1 day interval, the predicted growth rate value is connected with
'_' #### 5. Dataset for Psplines predicted NDVI value ##### A.
Name of the data file: AmesDP2017_observed_Psplines_model_fitted_value
##### B. File Overview -Number of variables/columns: 21 -Number of rows:
1752 -Variable List: geno: Accession name NDVI_37DAP_2017 -
NDVI_115DAP_2017 (5 variables in total): NDVI measurements at 37, 44, 60,
73, and 115 DAP. DAP stands for Days after planting Pspline_37DAP -
Pspline_115DAP (15 variables in total): predicted NDVI value at 37, 44,
44, 51, 58, 65, 72, 79, 86, 93, 100, 107, 114, 60, 73, 115 DAP #### 6.
Dataset for plotting growth curve, and correlation between observed and
predicted NDVI value ##### A. Name of the data file:
AmesPanel2017_observed_Psplines_model_fitted_with_pop_structure ##### B.
File Overview -Number of variables/columns: 14 -Number of rows: 1752
-Variable List: Group: Population information for each accession
New_Group: Population information for each accession, different from
'Group' column is that the sweet/popcorn using the kernel-based
criteria (doesn’t matter what the genetic background is, only whether the
kernel type is sweet or pop) Order: Manually defined accession order
number Taxa: Accession name NDVI_37DAP_2017 - NDVI_115DAP_2017 (5
variables in total): NDVI measurements at 37, 44, 60, 73, and 115 DAP. DAP
stands for Days after planting Pspline_37DAP - Pspline_115DAP (5
variables in total): predicted NDVI value at 37, 44, 60, 73, and 115 DAP.
### GWAS of NDVI and Pspline Curve Parameters Dataset #### 1. Dataset for
plotting Manhattan plot for NDVI and Pspline Curve Parameters ##### A.
Name of the data file: GWAS-21M_SNPs.txt ##### B. File Overview -Number
of variables/columns: 2 -Number of rows: 21129389 -Variable List:
CHROM: Chromosome number POS: Position on correspongind chromosome
#### 2. Dataset for plotting Manhattan plot for NDVI and Pspline Curve
Parameters ##### A. Name of the data file: NDVI_candidate_gene_list_FDR
##### B. File Overview -Number of variables/columns: 13 -Number of rows:
93 -Variable List: Trait: trait name Gene_name: Gene name
Gene_ID_v3: Gene ID from maize B73 genome Version 3 Gene_Chr_V3:
chromosome number basedon B73 genome Version 3 Gene_S: start position
on chromosome Gene_E: end position on chromosome Tag_SNP: tagged
SNP position on chromosome Distance: the distance from tagged SNP to
gene Abt_distance: the absoute distance from tagged SNP to gene
Alias: gene alias name Significant: whether the tagged SNP is
significantly associated with the trait AtID: gene ID in arabidopsis
OsID: gene ID in rice #### 3. Dataset for plotting Manhattan plot for
NDVI and Pspline Curve Parameters ##### A. Name of the data file:
NDVI_GWAS_FDR_threshold ##### B. File Overview -Number of
variables/columns: 5 -Number of rows: 25 -Variable List: Trait: Trait
name p_value: p-value obtained from GWAS Threshold: calculted FDR
threshold #### 4. Dataset for plotting Manhattan plot for NDVI and Pspline
Curve Parameters ##### A. Name of the data file: ##### A. Name of the data
file: AmesDP_NDVI_73DAP_2017_genome_wide_gwas_results_sorted_log2_wh,
mesDP_NDVI_115DAP_2017_genome_wide_gwas_results_sorted_log2_wh,
AmesDP_max_rate_genome_wide_gwas_results_sorted_log2_wh,
AmesDP_asymptote_genome_wide_gwas_results_sorted_log2_wh ##### B. File
Overview - GWAS output from GAPIT for 4 different trait, NDVI_73DAP,
NDVI_115DAP, asymptote, max rate. normal GAPIT output file. Skip the
description for file overview ### Dynamic Changes of Allelic Effect
Dataset ##### A. Name of the data file: Psplines_SNP_effect ##### B. File
Overview Number of variables/columns:9 -Number of rows: 45 -Variable
List: SNP: SNP ID Position: SNP position P.value: P-value
maf: minor allele frequency Rsquare.of.Model.without.SNP: Rsquare of
model when not including the SNP Rsquare.of.Model.with.SNP: Rsquare of
model when including the SNP FDR_Adjusted_P.values: FDR adjusted
p_values effect: SNP effect Chro: Chromosome number