10.25338/B8031W
Guillon, Hervé
0000-0002-6297-8253
University of California, Davis
Byrne, Colin F.
0000-0003-4752-2503
University of California, Davis
Lane, Belize Arela Albin
0000-0003-2331-7038
Utah State University
Sandoval Solis, Samuel
University of California, Davis
Pasternack, Gregory Brian
University of California, Davis
Channel types predictions for the Sacramento River basin
Dryad
dataset
2020
Water management
California Environmental Protection Agency
https://ror.org/02gkqqp86
16-062-300
United States Department of Agriculture
https://ror.org/01na82s61
CA‐D‐LAW‐7034‐H
United States Department of Agriculture
https://ror.org/01na82s61
CA‐D‐LAW‐2243‐H
2020-02-25T00:00:00Z
2020-02-25T00:00:00Z
en
18510490 bytes
4
CC0 1.0 Universal (CC0 1.0) Public Domain Dedication
Hydrologic and geomorphic classifications have gained traction in response
to the increasing need for basin-wide water resources management.
Regardless of the selected classification scheme, an open scientific
challenge is how to extend information from limited field sites to
classify tens of thousands to millions of channel reaches across a basin.
To address this spatial scaling challenge, we leveraged machine learning
to predict reach-scale geomorphic channel types using publicly available
geospatial data.
A bottom-up machine learning approach selects the most accurate and stable
model among ~96,000 and derives the relationship between 147 predictors
and labels corresponding to regional channel types in a three-tiered
framework which: (i) define a tractable problem; assess model performance
(ii) in statistical learning; and (iii) in prediction. In the present
application to the Sacramento River basin (California, USA), the developed
framework selects a Random Forest model to predict 10 channel types
previously determined from 290 field-surveys over 108,943 200-m stream
interval. Performance in statistical learning is high with a 65% median
cross-validation accuracy and a 0.91 mean multiclass Area Under Curve
value. Furthermore, the predictions coherently capture the expected
geomorphic organization of the landscape. As main metric of uncertainty,
we include for each stream-segment the entropy calculated from the
posterior probabilities output from the machine learning algorithm. For
completeness, evenness and richness are also reported. The predictions
included in this dataset corresponds to an aggregated version of the
output from the machine learning framework. Each initial 200-m stream
interval was aggregated by using their COMIDs from the National
Hydrography Dataset corresponding to the most common identifier used by
stake-holders. For each resulting NHD stream line, the 200-m scale
probabilities associated to each channel types are summed and normalized
to sum to one.
Format: shapefile (.shp) Spatial reference: Projection: California Albers
Datum: NAD83 Units: meters Attributes: COMID: Common identifier of the NHD
feature FDATE: Feature Currency Date RESOLUTION: Always
"Medium" GNIS_ID: Geographic Names Information System ID for the
value in GNIS_Name GNIS_NAME: Feature Name from the Geographic Names
Information System LENGTHKM: Feature length in kilometers REACHCODE:
Reach Code assigned to feature FLOWDIR: Flow direction is “WithDigitized”
or “Uninitialized” WBAREACOMI: ComID of an NHD polygonal water feature
through which an NHD “Artificial Path” flowline flows FTYPE: NHD Feature
Type FCODE: Numeric codes for various feature attributes in the
NHDFCode lookup table SHAPE_LENG: Feature length in decimal degrees
ENABLED: Always "True" GNIS_NBR: Internal field for data
processing group: most probable channel types shannon: Shannon's
entropy calculated from the posterior probabilities richness: Richness
calculated from posterior probabilities evenness: Evenness calculated from
posterior probabilities SAC01: posterior probability for channel type
SAC01 (i.e. membership) SAC02: posterior probability for channel type
SAC02 (i.e. membership) SAC03: posterior probability for channel type
SAC03 (i.e. membership) SAC04: posterior probability for channel type
SAC04 (i.e. membership) SAC05: posterior probability for channel type
SAC05 (i.e. membership) SAC06: posterior probability for channel type
SAC06 (i.e. membership) SAC07: posterior probability for channel type
SAC07 (i.e. membership) SAC08: posterior probability for channel type
SAC08 (i.e. membership) SAC09: posterior probability for channel type
SAC09 (i.e. membership) SAC10: posterior probability for channel type
SAC10 (i.e. membership) The ten channel types are: SAC01: Unconfined,
boulder-bedrock, bed-undulating SAC02: Confined, boulder, high-gradient,
step-pool/cascade SAC03: Confined, boulder-bedrock, uniform SAC04:
Confined, boulder-bedrock, low-gradient step-pool SAC05: Confined,
gravel-cobble, uniform SAC06: Partly-confined, low width-to-depth,
gravel-cobble, riffle-pool SAC07: Partly-confined, cobble-boulder, uniform
SAC08: Partly-confined, high width-to-depth, gravel-cobble, riffle-pool
SAC09: Unconfined, low width-to-depth, gravel SAC10: Unconfined,
gravel-cobble, riffle-pool