← Back to Projects

Multi-Source Uncertainty Aware Fusion for Soil Moisture Estimation

CYGNSS satellite constellation
PythonMachine LearningRemote Sensing Uncertainty QuantificationGNSS-R Optimal TransportModel Fusion

Brief

Combines forecasts from SMAP L4 (time-series model with low latency) and retrievals from CYGNSS (satellite observations with inherent delay) into a single, calibrated soil-moisture estimate with uncertainty quantification. Fusion uses optimal transport to blend predictive distributions rather than simple averaging.

The Problem

SMAP L4 is a data assimilation model that produces high-quality soil moisture estimates but with ~2.5 days latency — by the time data is available, conditions have already changed. CYGNSS provides instantaneous measurements from satellite observations, but passes sparsely over any given location and carries observation noise.

The goal is to combine these two complementary information sources: leverage SMAP's temporal coherence and bias correction, while incorporating CYGNSS's real-time responsiveness. We fuse the full predictive distributions using a W₂ barycenter, with multiple weighting strategies to balance the two sources optimally.

Method

Forecasting Leg: SMAP L4 Time-Series

An ARIMAX (autoregressive integrated moving average with exogenous inputs) model propagates historical SMAP L4 soil moisture forward in time, producing a Gaussian predictive distribution for each forecast horizon H ∈ {2,3,4,5} days ahead.

Retrieval Leg: CYGNSS Probabilistic Regression

NGBoost (Natural Gradient Boosting) trains on CYGNSS delay–Doppler observations to construct a real-time, direct estimate of soil moisture as a probability distribution. This source responds to current satellite data but has higher noise.

Fusion: Wasserstein-2 Barycenter

Rather than averaging the two predictions as numbers, we average them as probability distributions using a W₂ barycenter (weighted average of quantile functions). This preserves uncertainty structure and enables calibrated prediction intervals.

Weight Selection: Five strategies tested on validation data, then fixed for test evaluation:

  • Equal: 50–50 blend.
  • RMSE-tuned: Optimizes for lowest prediction error (but may overfit).
  • CRPS-tuned: Optimizes CRPS (Continuous Ranked Probability Score), a metric that rewards sharp, calibrated distributions.
  • qMSE-tuned: Minimizes mean squared error in quantile space — balances prediction accuracy and interval width.
  • L₂-CRPS: Linear pooling baseline using CRPS.

Results at Horizon H=3 (3-day forecast)

Experiments conducted on three soil moisture monitoring stations (jr1, jr2, jr3) from the SoilSCAPE network in New Mexico. These sites provide in-situ reference measurements for validation.

Evaluation Metrics:

  • RMSE: Root mean squared error (m³/m³) — prediction accuracy.
  • CRPS: Continuous ranked probability score — measures both sharpness and calibration of the predictive distribution.
  • ECE: Expected calibration error — difference between predicted and observed coverage of prediction intervals. Lower = better calibrated.
Site Method Weight (w) RMSE CRPS ECE
JR1 Site
CYGNSS Retrieval 0.0280 0.0153 0.077
SMAP Forecast 0.0375 0.0193 0.057
W₂ (Equal) 0.5000 0.0290 0.0153 0.054
W₂ (qMSE) 0.6384 0.0278 0.0148 0.019
JR2 Site
CYGNSS Retrieval 0.0193 0.0106 0.102
SMAP Forecast 0.0244 0.0117 0.127
W₂ (Equal) 0.5000 0.0184 0.0093 0.086
W₂ (qMSE) 0.6632 0.0178 0.0093 0.030
JR3 Site
CYGNSS Retrieval 0.0243 0.0132 0.129
SMAP Forecast 0.0317 0.0161 0.097
W₂ (Equal) 0.5000 0.0247 0.0125 0.059
W₂ (qMSE) 0.5949 0.0240 0.0122 0.014

Key Findings:

  • qMSE weighting wins on all three sites: Achieves lowest RMSE and CRPS across jr1, jr2, jr3, plus dramatically better calibration (ECE: 0.019, 0.030, 0.014 vs. naive equal weighting at 0.054, 0.086, 0.059).
  • JR2 best case: qMSE fusion achieves 0.0178 RMSE (vs. 0.0244 forecast alone, 0.0193 retrieval alone) with ECE of only 0.030 — highly trustworthy uncertainty estimates.
  • Complementarity confirmed: Neither source dominates; fusion systematically improves both accuracy and reliability by combining SMAP temporal structure with CYGNSS real-time observations.
JR2 Pareto forecast performance

JR2 horizon-wise fusion gains relative to the forecast baseline in the RMSE–CRPS plane.

JR2 Pareto retrieval performance

JR2 horizon-wise fusion gains relative to the retrieval baseline in the RMSE–CRPS plane.

Implementation

Code: github.com/lpoly/IGARSS-2026

  • Python — full experimental pipeline
  • NGBoost — probabilistic CYGNSS retrieval
  • ARIMAX / SARIMAX — SMAP L4 forecasting
  • NumPy / Pandas / SciPy — feature handling, distribution computations, evaluation
  • Matplotlib — quantitative analysis and figures

Publication Details

Conference: IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2026

Status: Accepted

Paper: "Multi-Source Uncertainty Aware Fusion for Soil Moisture Estimation"