Multi-Source Uncertainty Aware Fusion for Soil Moisture Estimation
Brief
Combines forecasts from SMAP L4 (time-series model with low latency) and retrievals from CYGNSS (satellite observations with inherent delay) into a single, calibrated soil-moisture estimate with uncertainty quantification. Fusion uses optimal transport to blend predictive distributions rather than simple averaging.
The Problem
SMAP L4 is a data assimilation model that produces high-quality soil moisture estimates but with ~2.5 days latency — by the time data is available, conditions have already changed. CYGNSS provides instantaneous measurements from satellite observations, but passes sparsely over any given location and carries observation noise.
The goal is to combine these two complementary information sources: leverage SMAP's temporal coherence and bias correction, while incorporating CYGNSS's real-time responsiveness. We fuse the full predictive distributions using a W₂ barycenter, with multiple weighting strategies to balance the two sources optimally.
Method
Forecasting Leg: SMAP L4 Time-Series
An ARIMAX (autoregressive integrated moving average with exogenous inputs) model propagates historical SMAP L4 soil moisture forward in time, producing a Gaussian predictive distribution for each forecast horizon H ∈ {2,3,4,5} days ahead.
Retrieval Leg: CYGNSS Probabilistic Regression
NGBoost (Natural Gradient Boosting) trains on CYGNSS delay–Doppler observations to construct a real-time, direct estimate of soil moisture as a probability distribution. This source responds to current satellite data but has higher noise.
Fusion: Wasserstein-2 Barycenter
Rather than averaging the two predictions as numbers, we average them as probability distributions using a W₂ barycenter (weighted average of quantile functions). This preserves uncertainty structure and enables calibrated prediction intervals.
Weight Selection: Five strategies tested on validation data, then fixed for test evaluation:
- Equal: 50–50 blend.
- RMSE-tuned: Optimizes for lowest prediction error (but may overfit).
- CRPS-tuned: Optimizes CRPS (Continuous Ranked Probability Score), a metric that rewards sharp, calibrated distributions.
- qMSE-tuned: Minimizes mean squared error in quantile space — balances prediction accuracy and interval width.
- L₂-CRPS: Linear pooling baseline using CRPS.
Results at Horizon H=3 (3-day forecast)
Experiments conducted on three soil moisture monitoring stations (jr1, jr2, jr3) from the SoilSCAPE network in New Mexico. These sites provide in-situ reference measurements for validation.
Evaluation Metrics:
- RMSE: Root mean squared error (m³/m³) — prediction accuracy.
- CRPS: Continuous ranked probability score — measures both sharpness and calibration of the predictive distribution.
- ECE: Expected calibration error — difference between predicted and observed coverage of prediction intervals. Lower = better calibrated.
| Site | Method | Weight (w) | RMSE | CRPS | ECE |
|---|---|---|---|---|---|
| JR1 Site | |||||
| CYGNSS Retrieval | — | 0.0280 | 0.0153 | 0.077 | |
| SMAP Forecast | — | 0.0375 | 0.0193 | 0.057 | |
| W₂ (Equal) | 0.5000 | 0.0290 | 0.0153 | 0.054 | |
| W₂ (qMSE) | 0.6384 | 0.0278 | 0.0148 | 0.019 | |
| JR2 Site | |||||
| CYGNSS Retrieval | — | 0.0193 | 0.0106 | 0.102 | |
| SMAP Forecast | — | 0.0244 | 0.0117 | 0.127 | |
| W₂ (Equal) | 0.5000 | 0.0184 | 0.0093 | 0.086 | |
| W₂ (qMSE) | 0.6632 | 0.0178 | 0.0093 | 0.030 | |
| JR3 Site | |||||
| CYGNSS Retrieval | — | 0.0243 | 0.0132 | 0.129 | |
| SMAP Forecast | — | 0.0317 | 0.0161 | 0.097 | |
| W₂ (Equal) | 0.5000 | 0.0247 | 0.0125 | 0.059 | |
| W₂ (qMSE) | 0.5949 | 0.0240 | 0.0122 | 0.014 | |
Key Findings:
- qMSE weighting wins on all three sites: Achieves lowest RMSE and CRPS across jr1, jr2, jr3, plus dramatically better calibration (ECE: 0.019, 0.030, 0.014 vs. naive equal weighting at 0.054, 0.086, 0.059).
- JR2 best case: qMSE fusion achieves 0.0178 RMSE (vs. 0.0244 forecast alone, 0.0193 retrieval alone) with ECE of only 0.030 — highly trustworthy uncertainty estimates.
- Complementarity confirmed: Neither source dominates; fusion systematically improves both accuracy and reliability by combining SMAP temporal structure with CYGNSS real-time observations.
JR2 horizon-wise fusion gains relative to the forecast baseline in the RMSE–CRPS plane.
JR2 horizon-wise fusion gains relative to the retrieval baseline in the RMSE–CRPS plane.
Implementation
Code: github.com/lpoly/IGARSS-2026
- Python — full experimental pipeline
- NGBoost — probabilistic CYGNSS retrieval
- ARIMAX / SARIMAX — SMAP L4 forecasting
- NumPy / Pandas / SciPy — feature handling, distribution computations, evaluation
- Matplotlib — quantitative analysis and figures
Publication Details
Conference: IEEE International Geoscience and Remote Sensing Symposium (IGARSS) 2026
Status: Accepted
Paper: "Multi-Source Uncertainty Aware Fusion for Soil Moisture Estimation"