Research: TLA+ Spec Augmentation (ID-147)¶
Date: 2026-04-17 Status: In progress — spike plan Scope: Plan and execution record for ID-147: three minimal TLA+ modules augmenting existing Markdown specs with machine-checkable cross-layer consistency properties. Related: ID-147, sdd/formal/README.md, spec 003, spec 019, rfc-0011.
Context: PR 448 (WriteResult) amended five specs in one feature. The manual ripple-check table tells reviewers what to check; it cannot tell them whether the amended specs are mutually consistent. Dafny covers per-operation contracts and data-structure invariants but has no model of how layers compose. This spike explores TLA+ as a lightweight, additive layer for cross-spec protocol consistency — augmenting the Markdown specs, not replacing them.
1. Scope boundaries¶
In scope for this spike:
| TLA+ module | Shadows | Key properties | Status |
|---|---|---|---|
Backend.tla |
spec 003 (CAP-001..CAP-007, BE-001..BE-019) | capability gate ordering, error discrimination, entry partition | Exploratory — no demonstrated bundling yet; targets are abstract-layer properties |
Store.tla |
spec 001 (STORE-004, STORE-005, STORE-006) | store wraps backend, capability gate fires before I/O | Exploratory — thin scaffold; gains concrete targets once WR-019 is in scope |
Observer.tla |
spec 019 (OBS-001..OBS-003) | every completed op fires exactly one event; hook routing matches outcome | Known-value — EventBijection, HookRouting, NoDoubleDispatch mirror the WR-018 pattern of independently-breakable claims |
Explicitly out of scope for this spike:
- WR-019 proxy forwarding (spec 045 / OBS-015): these sections live on
mastervia PR 448 but not on this branch. Captured as a follow-up in ID-147 once spec 045 is available here. - Async store API (spec 029 / ID-013b): higher concurrency complexity; defer until single-threaded model is understood.
- Replacing or superseding any Markdown spec section.
2. What TLC checks that Dafny does not¶
Dafny proves properties within a single method contract or refinement. TLC model-checks reachable states of a composed system by exhaustive (bounded) enumeration. The two tools are complementary:
| Question | Dafny | TLC |
|---|---|---|
| "Does this postcondition hold for all inputs?" | ✅ unbounded proof | bounded (finite model) |
| "Can a failed op still fire on_write instead of on_error?" | ❌ not expressible | ✅ state enumeration |
| "Does the capability gate always fire before any I/O action?" | ❌ cross-layer | ✅ interleaving |
| "Can two layers disagree on WriteResult?" | ❌ cross-module | ✅ composition |
3. Module design¶
3.1 Backend.tla¶
What it models: The abstract backend interface — not the filesystem implementation (that is Dafny's job). This module defines the protocol a backend participates in: what capabilities exist, what errors are possible, what the outcome of each operation class is.
CONSTANTS Paths, Capabilities, Errors
VARIABLES fs, \* function: Paths -> {"file", "dir", "missing"}
declared_caps \* set of declared Capability values
TypeInvariant ==
/\ fs \in [Paths -> {"file", "dir", "missing"}]
/\ declared_caps \subseteq Capabilities
EntryPartition ==
\A p \in Paths: ~(fs[p] = "file" /\ fs[p] = "dir")
\* NOTE: sketch only. In a real TLA+ action, outcome must be declared as
\* a VARIABLE and assigned via a primed expression (outcome').
CapabilityGate(op_cap) ==
op_cap \notin declared_caps =>
outcome' = "CapabilityNotSupported"
Key properties to check:
- EntryPartition: invariant — no path is simultaneously file and dir.
- CapabilityGateFiresFirst: if a capability is missing, no fs mutation
occurs (outcome = CapabilityNotSupported AND fs' = fs).
- ErrorDiscrimination: NotFound # AlreadyExists # InvalidPath # ...
(error types are distinct — catches any future enum collision).
3.2 Store.tla¶
What it models: The Store layer wrapping a Backend. Focuses on the
protocol boundary: Store adds its own capability check (via require()),
then delegates to the backend. The key question: does the Store's gate
always fire before any backend action?
\* Stand-alone module (no EXTENDS Backend).
\* Re-declares fs and declared_caps as its own variables; Backend.tla is a
\* peer module, not a parent. EXTENDS composition is deferred to Phase 6
\* once the hierarchy value is empirically demonstrated via break-and-catch.
VARIABLES fs, \* mirror of Backend fs
declared_caps, \* mirror of Backend declared_caps
store_outcome \* outcome of the Store operation
CapGatePrecedesIO ==
\* If Store fires CapabilityNotSupported, backend action is skipped.
(store_outcome = "CapabilityNotSupported") =>
UNCHANGED fs
For now Store.tla is thin — it establishes the scaffold that WR-019 forwarding will inhabit once spec 045 is available.
3.3 Observer.tla¶
What it models: The observer proxy wrapping Store. Adds an events
sequence and models hook dispatch from spec 019 OBS-001..OBS-003.
The key properties from OBS-003:
- on_write fires after write, write_text, write_atomic.
- on_error fires for any operation that raises.
- on_any fires for every completed operation (success or error).
VARIABLES events \* sequence of StoreEvent records
EventBijection ==
\* Every completed operation produces exactly one event in on_any.
Len(events) = completedOps
HookRouting ==
\* A write that succeeds goes to on_write; one that fails goes to on_error.
\A e \in Range(events):
/\ (e.operation \in WriteOps /\ e.error = None) => on_write_fired(e)
/\ (e.error # None) => on_error_fired(e)
\* CorrelatesOp(e, op) == e.op_id = op \* field predicate; define in module
NoDoubleDispatch ==
\* on_any fires exactly once per operation, even if around hook raises.
\A op \in CompletedOps: Cardinality({e \in Range(events): CorrelatesOp(e, op)}) = 1
NoDoubleDispatch is the most valuable check: the around-hook pattern
(OBS-005) nests around the operation and the hook dispatch. If the
around context manager raises on __exit__, does on_any fire twice?
TLC can enumerate this scenario; Dafny cannot express it.
4. Toolchain¶
4.1 TLC (model checker)¶
TLC is the standard TLA+ model checker. It runs from tla2tools.jar
(self-contained, ~10 MB, Java 11+). No Toolbox GUI required.
Local setup: download tla2tools.jar into scripts/ (gitignored) or use
Docker (same pattern as scripts/dafny_verify.sh). Prefer Docker for CI
reproducibility.
4.2 Module + config structure¶
sdd/formal/tla/
Backend.tla abstract backend protocol (stand-alone)
Store.tla store layer properties (stand-alone; no EXTENDS Backend)
Observer.tla observer proxy properties (stand-alone; no EXTENDS Store)
MC_Backend.tla model-checking harness (instantiates with small Paths)
MC_Store.tla model-checking harness for Store properties
MC_Observer.tla model-checking harness for Observer properties
MC_Backend.cfg TLC config: INIT, NEXT, INVARIANTS
MC_Store.cfg
MC_Observer.cfg
Stand-alone modules per concern: EXTENDS composition was not needed in the
PoC and stand-alone modules were cleaner. Hierarchy via EXTENDS / INSTANCE
can be added in Phase 6 once at least one cross-module invariant demonstrates
value via break-and-catch.
Each MC_*.tla file instantiates the corresponding module with a small
finite model (e.g. Paths == {"a", "b"}, Capabilities == {READ, WRITE}).
This keeps TLC state space tractable while covering all relevant cases.
4.3 CI integration¶
Add a verify-tla job to .github/workflows/ci.yml, triggered when
sdd/formal/tla/** changes. Pattern mirrors the existing verify-formal
(Dafny) job:
verify-tla:
runs-on: ubuntu-latest
if: ...changed files include sdd/formal/tla/...
steps:
- uses: actions/setup-java@v4
with: { java-version: '21' }
- run: bash scripts/tlc_check.sh
scripts/tlc_check.sh runs TLC on each MC_*.tla and exits non-zero on
any invariant violation or TLC error.
5. Implementation plan¶
Each phase is a self-contained session. Commit after each phase so the branch records the progression.
Phase 1 — Toolchain (session 1)¶
- [ ] Create
sdd/formal/tla/directory. - [ ] Write
scripts/tlc_check.sh(Docker wrapper, mirrorsscripts/dafny_verify.sh). - [ ] Verify TLC runs clean on a trivial "hello world" spec.
- [ ] Add
verify-tlastub to.github/workflows/ci.yml(not yet enforced — no modules to check).
Phase 2 — Backend.tla (session 2)¶
- [ ] Write
Backend.tla: CONSTANTS, VARIABLES, TypeInvariant, EntryPartition, CapabilityGate, operation actions. - [ ] Write
MC_Backend.tla+MC_Backend.cfg:Paths == {"a", "b"},Capabilities == {READ, WRITE, DELETE}, INIT, NEXT, INVARIANTS. - [ ] Run TLC. Confirm: all invariants hold, state space is enumerable in < 30 s.
- [ ] Record: authoring time, friction, anything surprising.
Phase 3 — Store.tla (session 3)¶
- [ ] Write
Store.tla: stand-alone module (noEXTENDS Backend), addsCapGatePrecedesIO. - [ ] Write
MC_Store.tla+MC_Store.cfg. - [ ] Run TLC. Confirm
CapGatePrecedesIOholds. - [ ] Intentionally break it (remove gate) and confirm TLC produces a counterexample trace — this validates TLC is checking something real.
Phase 4 — Observer.tla (session 4)¶
- [ ] Write
Observer.tla: stand-alone module (noEXTENDS Store), addsevents,EventBijection,HookRouting,NoDoubleDispatch. - [ ] Write
MC_Observer.tla+MC_Observer.cfg. - [ ] Run TLC. Confirm all properties hold.
- [ ] Introduce a deliberate dispatch bug (fire on_write for failed ops) and confirm TLC catches it with a counterexample.
Phase 5 — Spike report (session 5)¶
Update this document with findings: - Authoring workflow: how long, what was hard, contributor friction estimate. - TLC results: state space size, check time, any real violations found. - Proof depth: which properties are meaningfully bounded vs. trivially small. - Recommendation: extend (which targets next?), keep as-is, or icebox.
6. Open questions¶
-
Co-location:
sdd/formal/tla/keeps TLA+ alongside Dafny as a single formal layer. Alternative:sdd/tla/as a peer tosdd/formal/. Decision:sdd/formal/tla/preferred — both are machine-checkable spec artefacts, not runtime code. -
EXTENDS vs. INSTANCE (deferred): Modules start stand-alone (PoC §4.1 recommendation). If a cross-module invariant is identified that requires shared state,
EXTENDSpulls definitions in unmodified;INSTANCE ... WITHallows variable substitution. Decide in Phase 6 once stand-alone modules have survived break-and-catch and a concrete cross-module target is known. -
Bounded model guarantees: TLC with
Paths == {"a", "b"}exhaustively checks all 2-path scenarios. Is that enough to be meaningful? Initial hypothesis: yes for routing and gating properties; borderline for depth-counting properties (which Dafny already covers). -
CI enforcement vs. informational: Start CI as informational (non-blocking). Promote to blocking only when at least one TLA+ check has caught a real spec regression on a production branch — a caught regression, not a seeded break. Seeded breaks (Phases 3–4) validate that TLC is checking something real; they do not validate that the workflow pays. (PoC §4.3 rationale.)
-
Sync with master: This branch predates PR 448 (spec 045, OBS-015). Once the WR-019 forwarding target is ready, sync from master and add
Store.tlaWR-019 properties as a Phase 6.
7. Relationship to existing formal layer¶
| Layer | Artefact | Tool | Covers |
|---|---|---|---|
| Spec | sdd/specs/*.md |
Review | Human-readable contracts |
| Dafny | sdd/formal/*.dfy |
Dafny verifier | Per-operation pre/postconditions, data invariants, resource safety |
| TLA+ | sdd/formal/tla/*.tla |
TLC | Cross-layer protocol properties, hook routing, composition |
| Tests | tests/ |
pytest + Hypothesis | Runtime conformance, PBT |
No layer replaces another. Dafny and TLA+ answer different questions about the same system.