Skip to main content
CSML stands for Composite Safety-Model-Ledger. Earlier documentation used “Continuous Safety Monitoring Language” — that was deprecated in v0.2. The canonical definition is the formula below.
CSML is a composite risk index combining empirically-measured per-model safety metrics with evidence-ledger integrity into a single auditable safety score. It feeds the tier escalation function via Δ_trust.

The formula

CSML(m, P, t) = α · AR̄_m + β · BP̄_m + γ · SV̄_m − δ · CR̄_m + ε · 𝟙[ledger_intact(t)]
Where:
AR̄_m
float [0, 1]
Normalized prompt-level out-of-policy attempt rate for model m. Higher = more risk.
BP̄_m
float [0, 1]
Normalized mean blocks-per-prompt intensity. Higher = more policy interventions required.
SV̄_m
float [0, 1]
Normalized median overspeed severity. Higher = larger physical envelope violations.
CR̄_m
float [0, 1]
Normalized task completion rate. Higher = better; enters the formula with a negative sign.
𝟙[ledger_intact(t)]
0 | 1
Indicator: 1 if the Evidence Ledger hash chain is unbroken through time t, else 0.

Default weights

WeightValueWhat it emphasizes
α0.30Out-of-policy attempt rate (primary behavioral signal)
β0.25Block-per-prompt intensity (how often the gateway intervenes)
γ0.20Overspeed severity (physical envelope violations)
δ0.15Completion rate (negated — penalizes excessive blocking)
ε0.10Ledger integrity (meta-signal — is audit itself healthy?)
Weights are tunable per deployment. The defaults are defensible against ROSClaw’s data but not proven optimal. See FACTS for the version’s canonical weights.

Calibration against ROSClaw

ROSClaw’s TurtleBot3 cross-model study produces the following calibrated CSML values:
ModelAR_mBP_mSV_mCR_mCSML (est.)Risk
Claude Opus 4.60.140.321.280.8650.21Low
GPT-5.20.090.181.220.8230.16Low
Gemini 3.1 Pro0.310.781.440.7900.44Medium
Llama 4 Maverick0.431.211.570.6680.66High
Under the tier escalation function, Llama 4 Maverick with CSML = 0.66 triggers Δ_trust = 2, automatically elevating any T0 request to T2. This is the protocol-level correction that protects against the 3.4× behavioral spread ROSClaw documented.
The gateway doesn’t trust model-vendor alignment claims. It measures behavior in situ and adjusts the tier. If a previously-safe model drifts in production (new release, new fine-tune, supply-chain compromise), the CSML tracks it.

Update cadence

CSML updates on a configurable cadence — default every 50 events or every 60 seconds, whichever is sooner. Every update emits a CSML_UPDATE event on the Evidence Ledger, so the safety score itself is auditable.
// Example update emitted as a ledger event
{
  eventType: "CSML_UPDATE",
  foundationModelId: "gpt-5.2",
  csml: {
    AR: 0.11,
    BP: 0.22,
    SV: 1.19,
    CR: 0.841,
    ledgerIntact: 1,
    score: 0.18,
    trustDelta: 0
  },
  windowSize: 50,
  timestamp: "2026-04-17T02:45:12.123Z",
  prevHash: "sha256:...",
  eventHash: "sha256:..."
}

Federated CSML (future)

The current CSML is local to one deployment. A federated CSML — where multiple deployments contribute anonymized per-model safety observations to a shared score — is future work. See roadmap.

Tiers

How Δ_trust feeds into the tier escalation function.

Threat model

STRIDE+B class B (“Behavioral Non-Determinism”) uses CSML as its primary mitigation.