10.5061/DRYAD.MKKWH7123
Reeb, Rachel
0000-0003-4402-0268
University of Pittsburgh
Aziz, Naeem
University of Pittsburgh
Lapp, Samuel
University of Pittsburgh
Kitzes, Justin
0000-0001-7839-3594
University of Pittsburgh
Heberling, J. Mason
Carnegie Museum of Natural History
Kuebbing, Sara
0000-0002-0834-8189
University of Pittsburgh
Using convolutional neural networks to efficiently extract immense
phenological data from community science images
Dryad
dataset
2021
FOS: Biological sciences
Alliaria petiolata
Phenology
plant phenology
Machine learning
National Science Foundation
https://ror.org/021nxhr62
1747452
National Science Foundation
https://ror.org/021nxhr62
1936960
National Science Foundation
https://ror.org/021nxhr62
1936971
University of Pittsburgh
2022-01-04T00:00:00Z
2022-01-04T00:00:00Z
en
https://doi.org/10.3389/fpls.2021.787407
https://doi.org/10.5281/zenodo.5813100
1619248 bytes
4
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Community science image libraries offer a massive, but largely untapped,
source of observational data for phenological research. The iNaturalist
platform offers a particularly rich archive, containing more than 49
million verifiable, georeferenced, open access images, encompassing seven
continents and over 278,000 species. A critical limitation preventing
scientists from taking full advantage of this rich data source is labor.
Each image must be manually inspected and categorized by phenophase, which
is both time-intensive and costly. Consequently, researchers may only be
able to use a subset of the total number of images available in the
database. While iNaturalist has the potential to yield enough data for
high-resolution and spatially extensive studies, it requires more
efficient tools for phenological data extraction. A promising solution is
automation of the image annotation process using deep learning. Recent
innovations in deep learning have made these open-source tools accessible
to a general research audience. However, it is unknown whether deep
learning tools can accurately and efficiently annotate phenophases in
community science images. Here, we train a convolutional neural network
(CNN) to annotate images of Alliaria petiolata into distinct phenophases
from iNaturalist and compare the performance of the model with non-expert
human annotators. We demonstrate that researchers can successfully employ
deep learning techniques to extract phenological information from
community science images. A CNN classified two-stage phenology (flowering
and non-flowering) with 95.9% accuracy and classified four-stage phenology
(vegetative, budding, flowering, and fruiting) with 86.4% accuracy. The
overall accuracy of the CNN did not differ from humans (p = 0.383),
although performance varied across phenophases. We found that a primary
challenge of using deep learning for image annotation was not related to
the model itself, but instead in the quality of the community science
images. Up to 4% of A. petiolata images in iNaturalist were taken from an
improper distance, were physically manipulated, or were digitally altered,
which limited both human and machine annotators in accurately classifying
phenology. Thus, we provide a list of photography guidelines that could be
included in community science platforms to inform community scientists in
the best practices for creating images that facilitate phenological
analysis.
Creating a training and validation image set We downloaded 40,761
research-grade observations of A. petiolata from iNaturalist, ranging from
1995 to 2020. Observations on the iNaturalist platform are considered
“research-grade if the observation is verifiable (includes image),
includes the date and location observed, is growing wild (i.e. not
cultivated), and at least two-thirds of community users agree on the
species identification. From this dataset, we used a subset of images for
model training. The total number of observations in the iNaturalist
dataset are heavily skewed towards more recent years. Less than 5% of the
images we downloaded (n=1,790) were uploaded between 1995-2016, while over
50% of the images were uploaded in 2020. To mitigate temporal bias, we
used all available images between the years 1995 and 2016 and we randomly
selected images uploaded between 2017-2020. We restricted the number of
randomly-selected images in 2020 by capping the number of 2020 images to
approximately the number of 2019 observations in the training set. The
annotated observation records are available in the supplement
(supplementary data sheet 1). The majority of the unprocessed records
(those which hold a CC-BY-NC license) are also available on GBIF.org
(2021). One of us (R. Reeb) annotated the phenology of training and
validation set images using two different classification schemes:
two-stage (non-flowering, flowering) and four-stage (vegetative, budding,
flowering, fruiting). For the two-stage scheme, we classified 12,277
images and designated images as ‘flowering’ if there was one or more open
flowers on the plant. All other images were classified as non-flowering.
For the four-stage scheme, we classified 12,758 images. We classified
images as ‘vegetative’ if no reproductive parts were present, ‘budding’ if
one or more unopened flower buds were present, ‘flowering’ if at least one
opened flower was present, and ‘fruiting’ if at least one fully-formed
fruit was present (with no remaining flower petals attached at the base).
Phenology categories were discrete; if there was more than one type of
reproductive organ on the plant, the image was labeled based on the latest
phenophase (e.g. if both flowers and fruits were present, the image was
classified as fruiting). For both classification schemes, we only included
images in the model training and validation dataset if the image contained
one or more plants with clearly visible reproductive parts were clear and
we could exclude the possibility of a later phenophase. We removed 1.6% of
images from the two-stage dataset that did not meet this requirement,
leaving us with a total of 12,077 images, and 4.0% of the images from the
four-stage leaving us with a total of 12,237 images. We then split the
two-stage and four-stage datasets into a model training dataset (80% of
each dataset) and a validation dataset (20% of each dataset). Training a
two-stage and four-stage CNN We adapted techniques from studies applying
machine learning to herbarium specimens for use with community science
images (Lorieul et al. 2019; Pearson et al. 2020). We used transfer
learning to speed up training of the model and reduce the size
requirements for our labeled dataset. This approach uses a model that has
been pre-trained using a large dataset and so is already competent at
basic tasks such as detecting lines and shapes in images. We trained a
neural network (ResNet-18) using the Pytorch machine learning library
(Psake et al. 2019) within Python. We chose the ResNet-18 neural network
because it had fewer convolutional layers and thus was less
computationally intensive than pre-trained neural networks with more
layers. In early testing we reached desired accuracy with the two-stage
model using ResNet-18. ResNet-18 was pre-trained using the ImageNet
dataset, which has 1,281,167 images for training (Deng et al. 2009). We
utilized default parameters for batch size (4), learning rate (0.001),
optimizer (stochastic gradient descent), and loss function (cross entropy
loss). Because this led to satisfactory performance, we did not further
investigate hyperparameters. Because the ImageNet dataset has 1,000
classes while our data was labeled with either 2 or 4 classes, we replaced
the final fully-connected layer of the ResNet-18 architecture with
fully-connected layers containing an output size of 2 for the 2-class
problem and 4 for the 4-class problem. We resized and cropped the images
to fit ResNet’s input size of 224x224 pixels and normalized the
distribution of the RGB values in each image to a mean of zero and a
standard deviation of one, to simplify model calculations. During
training, the CNN makes predictions on the labeled data from the training
set and calculates a loss parameter that quantifies the model’s
inaccuracy. The slope of the loss in relation to model parameters is found
and then the model parameters are updated to minimize the loss value.
After this training step, model performance is estimated by making
predictions on the validation dataset. The model is not updated during
this process, so that the validation data remains ‘unseen’ by the model
(Rawat and Wang 2017; Tetko et al. 1995). This cycle is repeated until the
desired level of accuracy is reached. We trained our model for 25 of these
cycles, or epochs. We stopped training at 25 epochs to prevent
overfitting, where the model becomes trained too specifically for the
training images and begins to lose accuracy on images in the validation
dataset (Tetko et al. 1995). We evaluated model accuracy and created
confusion matrices using the model’s predictions on the labeled validation
data. This allowed us to evaluate the model’s accuracy and which specific
categories are the most difficult for the model to distinguish. For using
the model to make phenology predictions on the full, 40,761 image dataset,
we created a custom dataloader function in Pytorch using the Custom
Dataset function, which would allow for loading images listed in a csv and
passing them through the model associated with unique image IDs. Hardware
information Model training was conducted using a personal laptop (Ryzen 5
3500U cpu and 8 GB of memory) and a desktop computer (Ryzen 5 3600 cpu,
NVIDIA RTX 3070 GPU and 16 GB of memory). Comparing CNN accuracy to human
annotation accuracy We compared the accuracy of the trained CNN to the
accuracy of seven inexperienced human scorers annotating a random
subsample of 250 images from the full, 40,761 image dataset. An expert
annotator (R. Reeb, who has over a year’s experience in annotating A.
petiolata phenology) first classified the subsample images using the
four-stage phenology classification scheme (vegetative, budding,
flowering, fruiting). Nine images could not be classified for phenology
and were removed. Next, seven non-expert annotators classified the 241
subsample images using an identical protocol. This group represented a
variety of different levels of familiarity with A. petiolata phenology,
ranging from no research experience to extensive research experience (two
or more years working with this species). However, no one in the group had
substantial experience classifying community science images and all were
naïve to the four-stage phenology scoring protocol. The trained CNN was
also used to classify the subsample images. We compared human annotation
accuracy in each phenophase to the accuracy of the CNN using students
t-tests. The model and human annotated subsample data can be found in the
supplement (supplementary data sheet 2). This research is exempt from
University of Pittsburgh IRB approval according to the University’s Exempt
Criteria 45 CFR 46.104(d)(2). Unclassifiable images Within the four-stage
training and validation dataset, we removed 4% of plant images that could
not be classified into a phenological stage. To quantitatively assess the
cause of unclassifiable images, the experienced annotator (R.Reeb) labeled
these images in one of six categories: 1) camera distance (camera was too
far or too close to the plant to classify phenology), 2) physical
manipulation (the plant was no longer rooted in the ground), 3) digital
manipulation (the image was digitally altered or was copied from a
secondary source), 4) senesced plant (no remaining leaves), 5)
misidentified species (image did not contain A. petiolata), and 6)
duplicate entry (an image had been logged two or more times by the same
user).