Verification now reflects what we can actually prove.
HazardPulse freezes every live forecast into replay artifacts, tracks raw append-only ledgers, and now separates exact model benchmarks, pending maturity windows, and scoring backlogs. If a live model is not scored yet, this page says so directly.
Built 2026-04-15T20:44:14Z · Score as of 2026-04-15T20:43:04Z
Replay artifacts preserved across all live hazards.
Matured windows waiting for an evaluator or scoring run.
Prev-hash continuity failures in raw append-only ledgers.
Live model versions with an attached exact benchmark.
Control alerts
- 131 matured forecast windows are waiting for scoring.
- Earthquake: no exact benchmark is attached to the current live model version.
- Tornado: no exact benchmark is attached to the current live model version.
Hazard by hazard
These cards tell you whether each live model has exact scores, only related research benchmarks, or just frozen forecasts waiting for scoring.
Earthquake
WaitingProspective earthquake logging is live; the 30-day windows have not matured yet.
Keep freezing every earthquake forecast. Once the first 30-day windows mature, run the prospective scorer and use those scores to tune thresholds and calibration.
Hurricane
BacklogMatured hurricane forecasts exist, but no live advisory-to-outcome scorer is wired yet.
Implement an advisory-to-outcome scorer that joins frozen hurricane forecasts to realized 24-hour intensity change before using the model for calibration or promotion decisions.
Tornado
BacklogMatured tornado storm-object forecasts exist, but no live outcome scorer is wired yet.
Bind each frozen tornado storm-object forecast to a matched outcome definition and write a 24-hour scorer before using the live model for calibration or threshold changes.
Benchmark provenance
Exact benchmarks are safe to cite for the current live model version. Related benchmarks are useful for research context, but not as proof of live performance.
| Hazard | Type | Model | Metrics | Source updated |
|---|---|---|---|---|
| Earthquake | Related Research Benchmark | earthquake_honest_regional_suite | same-location AUC 0.797 | global AUC 0.907 | -- |
| Hurricane | Exact Model Benchmark | hurricane_ri_v8_1 | AUC 0.938 | Brier 0.034 | 2026-03-13T03:00:00Z |
| Tornado | Related Research Benchmark | hazardpulse_tornado_definitive_v1 | AUC 0.894 | 2026-03-31T04:07:24Z |
Storage and audit surfaces
This surface is intentionally stricter than marketing copy. A billion-dollar company needs a page that tells operators what is frozen, what is scored, what is only a research benchmark, and what still needs engineering work before it can influence model adjustment or promotion.
Use the evidence ledger for artifact-level traceability and this page for scoring readiness and benchmark discipline.