§ Chapter III · Platform Three live · more in development

Environmental intelligence, in production.

A family of prediction systems, each its own model. Three are live today — streamflow, avalanche danger, and water main-break prediction. More are in development. Built on open public data; documented end to end.

§ III.1HydroField § III.2AvalancheWatch § III.3Water main-break § III.4In development

§ III.1 — HydroField Streamflow · global

HydroField streamflow prediction.

A global streamflow-prediction model trained on the Caravan v1.6 multi-source basin assembly (16,299 basins across six Caravan datasets). Evaluated on a sacred-test set of 626 hold-out basins that was locked before training began.

Status Live

III.1

The streamflow model.

Seven-seed BF16 ensemble, evaluated under the same protocol as the published baselines we compare against.

Public benchmark: median Nash-Sutcliffe Efficiency 0.830 on a k-fold prediction-in-ungauged-basins evaluation. On a held-out test period it reaches 0.874, ahead of the published Kratzert (2019) LSTM (0.82) on the matched protocol; on 181 held-out Canadian basins, 0.894. Across the full 16,299-basin Caravan assembly, 0.804.

Across the regional sub-sets we report the full per-region medNSE distribution rather than a single headline, worst-decile basins surfaced. We're in line with or ahead of matched published protocols on like-for-like evaluation — with one honest exception, where a strong native-forcing regional baseline still leads on its home turf. We report that too.

Built on public benchmark data — the Caravan v1.6 multi-source basin assembly — pinned at version, spanning 16,299 basins. Reproducibility details available to research collaborators on request.

01 Public benchmark medNSE · k-fold prediction in ungauged basins 0.830

02 medNSE · 181 held-out Canadian basins, zero training overlap 0.894

03 medNSE · held-out test period (vs Kratzert 2019 · 0.82) 0.874

04 Training set · Caravan v1.6 multi-source basins 16,299

→Methodology & reproduction chain

§ III.2 — AvalancheWatch North America · daily

AvalancheWatch avalanche-danger predictions.

Operational daily avalanche-danger predictions across 46 North American forecast zones. A new rating posts every morning. The first product Elysium Fields AI shipped at production scale.

Status Live · daily public ratings

III.2

AvalancheWatch.

Every morning, the model issues a fresh five-level danger rating for each of 46 North American forecast zones — reaching 73.23% exact-level accuracy, matching the Swiss SLF operational benchmark.

Drawing on public snow and weather telemetry plus a curated avalanche-observation history, the model issues a single five-level rating per zone per day on the public North American danger scale.

We treat the system as a working operational service, not a research demo. That means CI on the data pipeline, monitored deploys, rate-limited public endpoints, and a continuous bug-audit cadence. The last full forensic audit closed May 2026 with the critical findings remediated and a published test-coverage gate before each release.

01 Exact five-level danger accuracy · matching Swiss SLF baseline (~73%) 73.23%

02 Binary “considerable-or-higher” safety-alert accuracy 85.7%

03 North American forecast zones · updated daily 46

04 Danger-rating scale · public North American five-level 1–5

→Methodology & signal sources

§ III.3 — Water main-break Utilities · validated

Water main-break prediction.

Predicting which buried water mains will fail next, so utilities can replace pipe before it breaks. Validated across three independent North American utilities on out-of-time tests.

Status Live

III.3

Which pipe breaks next.

A calibrated risk score for every segment in a utility's network — validated across three independent cities on out-of-time data.

Across Calgary, Kitchener, and DC Water, the model ranks every pipe segment by how likely it is to fail next, validated on out-of-time data each city had never seen. AUROC lands between 0.875 and 0.910, and — crucially — the probabilities are calibrated (expected calibration error ≤ 0.008): a predicted 5% break-risk means roughly five in a hundred such segments actually break.

In practice it turns a utility's existing records into a ranked, replace-first list — so capital goes where the risk actually is, not where the last break happened. The same approach has held up across three independent cities.

01 Independent utilities validated · out-of-time sacred test 3

02 Real breaks captured in the top-decile of predicted risk ~54–67%

03 AUROC range across the three cities 0.875–0.910

04 Probability calibration · expected calibration error ≤0.008

→Methodology & calibration

§ III.4 — In development Quietly · without timelines

More in quiet work.

More environmental-risk systems are in development. We don't name them or commit to timelines until they meet the same documentation-and-evaluation bar as the live ones — then they show up here.

· More to come Additional environmental-risk systems are in development. We hold names and timelines until each clears the same documentation-and-evaluation bar as the systems above — then it shows up here. Status
Quiet work.

§ Correspond

Water & utilities, snow and flood managers,
fire, drought & risk planners.

Water utilities, ski and avalanche operations, flood and emergency managers, wildfire and drought planners, insurers, and the academic collaborators who benchmark this work — whether you want an operational deployment where a system is already live, or to help shape the ones we're building next, the door is open.

[email protected]