Verification

Verification now reflects what we can actually prove.

Name: HazardPulse Verification Status
Creator: HazardPulse

HazardPulse freezes every live forecast into replay artifacts, tracks raw append-only ledgers, and now separates exact model benchmarks, pending maturity windows, and scoring backlogs. If a live model is not scored yet, this page says so directly.

Built 2026-07-15T03:04:17Z · Score as of 2026-07-15T03:03:50Z

1561

Frozen forecasts

Replay artifacts preserved across all live hazards.

1137

Scoring backlog

Matured windows waiting for an evaluator or scoring run.

Hash mismatches

Prev-hash continuity failures in raw append-only ledgers.

Exact benchmarks

Live model versions with an attached exact benchmark.

Control alerts

1137 matured forecast windows are waiting for scoring.
Tornado: no exact benchmark is attached to the current live model version.

Hazard by hazard

These cards tell you whether each live model has exact scores, only related research benchmarks, or just frozen forecasts waiting for scoring.

Earthquake

Scored

Prospective live scoring is active for matured earthquake windows.

Live modeleq_coherence_v1_0

Latest forecasteq_fcst_20260715_0000

Storage412 replays · horizon 30 days

Maturity291 matured · 290 scored

Primary metricAUC 0.709 · Brier 0.002

Related benchmarksame-location AUC 0.797 | global AUC 0.907

Raw ledger423 rows · 0 mismatches

Keep freezing every earthquake forecast. Once the first 30-day windows mature, run the prospective scorer and use those scores to tune thresholds and calibration.

Hurricane

Backlog

Matured hurricane forecasts exist, but no live advisory-to-outcome scorer is wired yet.

Live modelhurricane_ri_v8_1

Latest forecasthu_fcst_20260526_1250

Storage52 replays · horizon 24 hours

Maturity51 matured · 0 scored

Primary metricAUC 0.938 · Brier 0.034

Exact benchmarkAUC 0.938 · Brier 0.034

Raw ledgerNot implemented for this hazard yet

Implement an advisory-to-outcome scorer that joins frozen hurricane forecasts to realized 24-hour intensity change before using the model for calibration or promotion decisions.

Tornado

Backlog

Matured tornado storm-object forecasts exist, but no live outcome scorer is wired yet.

Live modeltornado_storm_v1_0

Latest forecastto_fcst_20260715_0303

Storage1097 replays · horizon 24 hours

Maturity1085 matured · 0 scored

Related benchmarkAUC 0.894

Raw ledger1163 rows · 0 mismatches

Bind each frozen tornado storm-object forecast to a matched outcome definition and write a 24-hour scorer before using the live model for calibration or threshold changes.

Calibration scoreboard

Reliability of the deployed (calibrated) forecasts, measured against the platform's own realized outcomes. A perfectly calibrated forecaster sits on the dashed diagonal; ECE and Brier-skill are shown raw → calibrated.

Earthquake

ECE (raw → calibrated)0.004 → 0.000

Brier skill-1.910 → 0.009

Distribution driftmajor (PSI 2.599)

Calibration sample3,393,000

Tornado

ECE (raw → calibrated)0.173 → 0.000

Brier skill-20.032 → -0.001

Distribution driftmajor (PSI 2.013)

Calibration sample127,734

Hurricane

ECE (raw → calibrated)0.027 → 0.027

Brier skill-- → --

Distribution driftinsufficient data (PSI 0.000)

Calibration sample14

Benchmark provenance

Exact benchmarks are safe to cite for the current live model version. Related benchmarks are useful for research context, but not as proof of live performance.

Hazard	Type	Model	Metrics	Source updated
Earthquake	Related Research Benchmark	earthquake_honest_regional_suite	same-location AUC 0.797 \| global AUC 0.907	--
Hurricane	Exact Model Benchmark	hurricane_ri_v8_1	AUC 0.938 \| Brier 0.034	2026-03-13T03:00:00Z
Tornado	Related Research Benchmark	hazardpulse_tornado_definitive_v1	AUC 0.894	2026-03-31T04:07:24Z

Storage and audit surfaces

Verification summary/data/verification-summary.json

Replay index/data/evidence/replay-index.json

Prediction ledger/data/evidence/prediction-ledger.json

Earthquake raw chain/data/earthquake-ledger.jsonl

Tornado raw chain/data/tornado-ledger.jsonl

This surface is intentionally stricter than marketing copy. A billion-dollar company needs a page that tells operators what is frozen, what is scored, what is only a research benchmark, and what still needs engineering work before it can influence model adjustment or promotion.

Use the evidence ledger for artifact-level traceability and this page for scoring readiness and benchmark discipline.