The streamflow model.
Seven-seed BF16 ensemble, evaluated under the same protocol as the published baselines we compare against.
Public benchmark: median Nash-Sutcliffe Efficiency 0.830 on a k-fold prediction-in-ungauged-basins evaluation. On a held-out test period it reaches 0.874, ahead of the published Kratzert (2019) LSTM (0.82) on the matched protocol; on 181 held-out Canadian basins, 0.894. Across the full 16,299-basin Caravan assembly, 0.804.
Across the regional sub-sets we report the full per-region medNSE distribution rather than a single headline, worst-decile basins surfaced. We're in line with or ahead of matched published protocols on like-for-like evaluation — with one honest exception, where a strong native-forcing regional baseline still leads on its home turf. We report that too.
Built on public benchmark data — the Caravan v1.6 multi-source basin assembly — pinned at version, spanning 16,299 basins. Reproducibility details available to research collaborators on request.