Skip to content

Audit 015 — Spec-to-Test Traceability

Backlog item: BK-250 Date: 2026-06-01 Scope: All 48 spec files in sdd/specs/ — every numbered spec ID in every file. Method: For each spec file, all invariant IDs were extracted by reading the file. Each ID was then searched across the entire tests/ tree for a matching @pytest.mark.spec("ID") decorator. Findings fall into four categories:

  • (a) No test at all: the behavior has no coverage and no mark — higher effort, requires writing new tests before adding marks.
  • (b) Test exists, mark absent: a test exercises the behavior under a sibling or parent ID but the spec-file-specific ID is not applied — low-effort label backfill.
  • (c) Spec defect: the spec file itself is the root problem (e.g. duplicate ID, out-of-sequence numbering) — cannot be marked or traced until the spec is repaired.
  • (d) Meta / architectural: the invariant describes a design constraint, dependency rule, or process obligation rather than a testable runtime behavior — not mark-able.

Total: 226 IDs in the table. Of these: 13 are type (d) (not actionable as mark work); 1 is type (c) (spec defect requiring renumber before it can be marked); the remaining 212 are type (a) or (b) and actionable.

Superseded by the Verified addendum. This first-pass split did not separate (a) from (b), and over-counted "actionable". Verification resolves the 212 into 5 genuine coverage gaps (a), ~127 label backfills (b), and the rest not actionable as mark work (deferred/design-only invariants plus the ~57 unbuilt-Graph IDs owned by ID-127). Read the addendum, not this line, for the actionable counts.


Findings

Spec ID Invariant summary Finding
001 STORE-007 Thread Safety No mark, no test
001 STORE-009 Context manager / resource management Test exists (tests/test_coverage_gaps.py::TestStoreBehavior::test_context_manager); mark absent
001 STORE-010 Store equality No mark, no test
001 STORE-011 Store.to_key Tests use NPR marks; STORE-011 absent
001 STORE-014 list_files(pattern=) Tests use GLOB-001; STORE-014 absent
001 STORE-015 Spec ID collision (type c) Two distinct invariants share STORE-015 (native_path and glob); STORE-014 appears between them out of sequence — spec defect, cannot be marked until renumbered
001 STORE-016 Depth-limited listing Tests use DEPTH-001; STORE-016 absent
002 CFG-007 Config priority / no env-var merge No mark, no test
003 CAP-007 Quality-flag capabilities No mark, no test
003 BE-011 write_atomic capability gate Class docstring says "BE-010 through BE-011"; no mark on any test method
003 BE-023 Backend.to_key Tests use NPR-003..005; BE-023 absent
003 BE-024 Backend.glob Tests use GLOB-004; BE-024 absent
003 BE-026 iter_children Tests use ITER-004/005; BE-026 absent
003 BE-027 _BACKEND_GATING graph IR metadata Test exists (tests/scripts/test_gen_graph.py::test_backend_gating_keys_match_backend_members); no BE-027 mark
005 ERR-013 ResourceLocked No test; class absent from source entirely
006 SIO-004 No partial reads on error No mark, no test
006 SIO-005 Cancellation propagation Test exists (tests/aio/test_async_cancellation.py); SIO-005 absent
006 SIO-006 No framework dependencies No mark (d: design principle — not test-markable)
006 SIO-007 read_text convenience Tests use RTXT-001; SIO-007 absent
007 AW-002 Capability gate No mark, no test
007 AW-004 Cleanup on failure Tests use SAW-004/005; AW-004 absent
007 AW-005 Intermediate directories for write_atomic No mark, no test
007 AW-006 Local mkstemp + os.replace No mark, no test
007 AW-007 No fallback to non-atomic No mark, no test
008 S3-001 Constructor Parameters No mark, no test
008 S3-006 Virtual Folder Semantics No mark, no test
008 S3-011 delete_folder Recursive No mark, no test
008 S3-012 delete_folder Non-Recursive No mark, no test
008 S3-013 move Via Copy + Delete No mark, no test
008 S3-014 copy Via S3 Server-Side Copy No mark, no test
010 NPR-002 Store.to_key as public helper No mark, no test
010 NPR-006 LocalBackend.to_key Test exists (tests/backends/local/test_config.py::TestLocalBackendToKeyRoot); NPR-006 absent
010 NPR-007 S3Backend.to_key Test exists; NPR-007 absent
010 NPR-008 SFTPBackend.to_key Test exists (tests/backends/sftp/test_config.py::TestSFTPToKey); NPR-008 absent
010 NPR-011 Store.to_key composition Test exists; NPR-011 absent
010 NPR-015 list_folders store-relative names No mark, no test
010 NPR-021 Backend.native_path contract Test exists (marked BE-025 only); NPR-021 absent
010 NPR-022 Store.native_path Test exists (marked STORE-015 only); NPR-022 absent
011 S3PA-001 Constructor Parameters No mark, no test
011 S3PA-006 Dual-Library Architecture No mark (d: architectural decision — not test-markable)
011 S3PA-007 Credential Translation No mark, no test
011 S3PA-008 Virtual Folder Semantics No mark, no test
011 S3PA-014 Copy Via PyArrow No mark, no test
011 S3PA-015 Move Via Hybrid No mark, no test
011 S3PA-016 Delete Via s3fs No mark, no test
012 AZ-007 Container Scope No mark, no test
012 AZ-008 Directory Semantics (HNS) Tests use BE-005/021; AZ-008 absent
012 AZ-009 Virtual Folder Semantics (no HNS) Tests use BE-* marks; AZ-009 absent
012 AZ-010 Write Does Not Create Folder Markers (no HNS) Tests use BE-008; AZ-010 absent
012 AZ-012 exists() Tests use BE-004; AZ-012 absent
012 AZ-013 is_file() / is_folder() Tests use BE-005; AZ-013 absent
012 AZ-015 delete_folder Recursive Tests use BE-013; AZ-015 absent
012 AZ-016 delete_folder Non-Recursive Tests use BE-013; AZ-016 absent
012 AZ-017 Move Tests use BE-018; AZ-017 absent
012 AZ-018 Copy Tests use BE-019; AZ-018 absent
012 AZ-019 Glob Tests use GLOB-020; AZ-019 absent
012 AZ-024 get_folder_info Tests use BE-017; AZ-024 absent
012 AZ-036 HNS Directory-Marker Probe Contract Tests use BE-021; AZ-036 absent
013 MEM-001..005 Constructor, name, capabilities, repr, registration Tests use BE-001..003; MEM marks absent
013 MEM-011 read_bytes() copy semantics No mark
013 MEM-013 write_atomic identical to write No mark
013 MEM-016b copy() deep copy content No mark
013 MEM-017..020 to_key, close, unwrap, no exceptions Tests use BE-020..022; MEM marks absent
013 MEM-025 Single-Lock Serialization No mark
013 MEM-026 Atomicity Scope No mark
014 PA-005 Root Path Is Empty String No mark
014 PA-023 Optional Dependency No mark
014 PA-026 Conformance Across Backends No mark
016 BATCH-010 batch_copy error collection Test exists (tests/ext/test_batch.py::test_error_continues); BATCH-010 absent
016 BATCH-013 batch_copy empty input Test exists (tests/ext/test_batch.py::test_empty_paths); BATCH-013 absent
016 BATCH-017 batch_exists empty input Test exists (tests/ext/test_batch.py::test_empty_paths); BATCH-017 absent
016 BATCH-023 Concurrent result ordering No mark, no test
016 BATCH-024 Concurrent error semantics No mark, no test
016 BATCH-025 Concurrent empty input No mark, no test
018 GLOB-015 No Backend Coupling Comment in test_glob.py; no mark
018 GLOB-017 Empty Results No mark
018 GLOB-019 S3PyArrowBackend Native Glob No mark
019 OBS-003a Hook-to-Operation Mapping No mark
019 OBS-015 WriteResult in Post-Operation StoreEvent No mark
021 CFG-014 Optional Extras No mark
022 SAW-009 SFTPBackend .~tmp + posix_rename Comment in test_atomic.py; no mark
022 SAW-010 S3 buffer + PUT Comment in test_atomic.py; no mark
022 SAW-011 Azure non-HNS buffer + PUT; HNS temp + rename Comment in test_atomic.py; no mark
022 SAW-015 ext.otel span lifecycle No mark
025 RET-015 Graph Retry Mapping No mark
026 PING-009 Error Classification Docstring in test_check_health.py; no mark
027 ITER-002 Capability Gating No mark
027 ITER-003 STORE-008 Update No mark
027 ITER-005 Backend Overrides Docstring in test_listing.py; no mark
027 ITER-006 ext.observe integration No mark
027 ITER-008 Spec Updates (meta) No mark (d: meta section — not test-markable)
028 RTXT-002..004 No Backend ABC change, STORE-008 update, ext.cache integration No marks
028 RTXT-006 Spec Updates (meta) No mark (d: meta section — not test-markable)
029 ASYNC-043 Delegation No mark
029 ASYNC-045a Capability-Gated Methods Graph IR No mark
029 ASYNC-052f head() No mark
029 ASYNC-056 No New Dependencies No mark (d: architectural constraint — not test-markable)
029 ASYNC-061 read_seekable() Deferral No mark
029 ASYNC-062 open_atomic() Deferral No mark
029 ASYNC-070..079 AsyncAzureBackend specifics (dual-mode, lazy init, write strategy, move/copy, content materialization, check_health, capabilities, shared helpers, cleanup, error mapping) No marks
030 WTXT-002..003 No Backend ABC change, STORE-008 update No marks
030 WTXT-006 Symmetric with read_text No mark
031 DAG-001 Serializer Protocol No mark
032 HTTP-CON-001..004 Construction No marks (test_examples.py uses stale HTTP-001; tests use BE/NPR/SIO marks)
032 HTTP-TR-001..003 Transport protocol No marks
032 HTTP-PATH-001..004 URL construction, native_path, to_key, round-trip No marks (tests use NPR-003)
032 HTTP-READ-001..002 read / read_bytes No marks (tests use SIO-001)
032 HTTP-EXIST-001..003 exists / is_file / is_folder No marks
032 HTTP-META-001..003 get_file_info, get_folder_info, known limitations No marks
032 HTTP-UNSUP-001 Write / delete / list unsupported No mark
032 HTTP-ERR-001..002 Error mapping No marks
032 HTTP-HEALTH-001 check_health No mark
032 HTTP-LIFE-001..002 close, unwrap No marks
032 HTTP-CRED-001 Credential masking No mark
032 HTTP-RETRY-001 Retry integration No mark
036 SEEK-007 Azure read() Unchanged No mark
039 TLS-008 tls_ca_bundle on AzureBackend No mark
039 TLS-009 Env var fallback chain for Azure No mark
039 TLS-010 Azure connection_verify injection No mark
040 SQL-BLOB-011 Custom Table Name No mark
040 SQL-BLOB-070 Blob Size Guidelines No mark
041 SQL-QUERY-010 Explicit Query Mapping No mark
041 SQL-QUERY-061 close() No mark
041 SQL-QUERY-063 SQLite PRAGMAs No mark
041 SQL-QUERY-090 Query Execution No mark
041 SQL-QUERY-091 Serialization Overhead No mark
042 PDS-009 Dagster Integration No mark
043 RES-001 Resolution Opacity No mark
044 GR-001..057 Entire Graph backend spec (~55 IDs: constructor, auth, path, read, write, upload session, delete, move, copy, error mapping, retry, file hashes, drive identity, credential masking, to_key, unwrap, close, client options) No marks anywhere
045 WR-006 Sidecar Source No mark
047 DOCFRAME-005 Bridge Replaces Not Augments No mark (d: doc framework principle — not test-markable)
047 DOCFRAME-006 Strict Build, Strict Links No mark (d: build constraint — not test-markable)
047 DOCFRAME-007 Nav and URL Alignment No mark (d: docs nav rule — not test-markable)
048 TEST-002 Conformance is Cross-Backend Spine No mark (d: testing-process spec — not test-markable)
048 TEST-003 Backend-Specific Tests Isolated Per Backend No mark (d: testing-process spec — not test-markable)
048 TEST-007 HTTP Cassette and Replay Layer No mark (d: testing-process spec — not test-markable)
048 TEST-008 Replay Scope is HTTP-Transport Only No mark (d: testing-process spec — not test-markable)
048 TEST-009 Cassette Refresh is Explicit No mark (d: testing-process spec — not test-markable)

Verified addendum (2026-06-01)

The original table classified rows by mark presence and used inconsistent wording ("No mark, no test" vs. bare "No mark"), which left the actionable count ambiguous: 212 rows were reported as "type (a) or (b)" without a split between them. This addendum resolves that ambiguity. Every row not already proven type (b) ("test exists" / "tests use X") was re-verified by reading the invariant text and searching the entire tests/ tree for any test that exercises the behavior, marked or not.

Method: per ID — read the invariant, grep the test tree for the behavior (method names, class names, keywords; not just the literal ID), classify as A (no test anywhere), B (tested under a different/absent mark), or D (not a runtime-testable behavior: design principle, meta/process section, or explicitly deferred feature).

Revised totals

Category Count Meaning
(a) Untested shipped behavior 5 The real coverage debt — table below
(b) Tested, mark absent ~127 Label gap; behavior runs (largely via cross-backend conformance under sibling marks)
(d) Not runtime-testable ~33 Design principles, meta/process sections, and deferred features (TLS Phase 2, ext.parquet Dagster-v2, async read_seekable/open_atomic deferrals, graph retry) — more than the 13 the first pass tagged, because several bare "No mark" rows are deferred or design-only
Implementation-pending (Graph) ~57 Spec 044 GR-001..GR-057 + ERR-013 describe GraphBackend, which is not implemented (absent from source and FEATURES.md). Owned by backlog ID-127; tests and marks land when the backend is built. Not traceability debt.
(c) Spec defect 1 STORE-015 duplicate ID (unchanged from first pass)

The first pass's "212 actionable" therefore over-counted. Verification resolves it into 5 genuine coverage gaps (a), ~127 mechanical label backfills (b), and ~77 rows that are not actionable as mark work — ~20 additional deferred/design-only invariants (on top of the 13 the first pass already tagged type-(d)) plus the ~57 unbuilt-Graph IDs owned by ID-127.

(a) The 5 untested shipped behaviors

Spec ID Untested behavior Evidence / note
008 / 011 S3-012 S3 & S3-PyArrow non-recursive delete_folder on a non-empty folder must raise DirectoryNotEmpty Code raises it (_s3.py:270), but tests/backends/conformance/test_errors.py::...::test_delete_folder_non_recursive_non_empty_raises calls _skip_flat_namespace, skipping S3 and S3PA. SQLBlob has a dedicated test (SQL-BLOB-025); S3/S3PA have none. Highest-severity gap (data-safety guard).
032 HTTP-CON-004 ReadOnlyHttpBackend.capabilities == {READ, METADATA} No test asserts the set (conformance checks type + absence of ATOMIC_MOVE/SEEKABLE_READ only). Three-way divergence: the runtime constant declares {READ, METADATA, LAZY_READ} (_http.py:41), but both the class docstring (_http.py:215) and spec 032 list {READ, METADATA}LAZY_READ looks like an accidental addition, not an intended capability, which makes the spec-wins resolution (principle 5) the obvious one. Fix before writing the test.
022 SAW-015 ext.otel span over the open_atomic lifecycle tests/ext/test_otel.py asserts spans for read/write/exists/delete only. The around-hook plumbing for open_atomic is exercised via ext.observe, but no otel-span assertion exists.
016 BATCH-023 Sequential batch preserves input order; concurrent order is non-deterministic All concurrent multi-item tests in tests/ext/test_batch.py assert via set(...), so ordering is never pinned.
032 HTTP-CON-003 ReadOnlyHttpBackend.name == "http" (literal) Conformance asserts only that name is a non-empty string. Trivial.

Partial sub-clause gaps (within otherwise-covered invariants)

Not full type (a), but flagged for completeness — these invariants are covered except for one clause:

  • STORE-007 — share-across-threads is tested (CHILD-010 concurrency test); the immutability clause has no dedicated test (Store is not a frozen dataclass).
  • PA-005 — root-as-empty-string mapping is tested; file ops on root raising FileNotFoundError is not.
  • HTTP-TR-002 — auto-detect + urllib fallback is tested; the httpx-before-requests preference ordering is not directly asserted.
  • HTTP-TR-003 — explicit transport override is tested; the ImportError-when-library-missing branch is pytest.skip-ped, never asserted.

Caveats on the type-(b) verdicts

  • SQL-QUERY-061 / SQL-QUERY-063 ride entirely on shared-base coverage via SqlBlobBackend tests; there is no sqlquery conformance fixture and no SqlQueryBackend-specific close / PRAGMA assertion. The weakest type-(b) rows.
  • GLOB-019 is type (b) only when s3_pyarrow_moto is live in the conformance run; under pyarrow>=24 / moto-unavailable skips, the native S3-PyArrow glob path is not exercised.
  • SAW-009 / SAW-011 are exercised both by the fixture_params(Capability.ATOMIC_WRITE)-parametrized conformance success path and backend-specific assertions (SFTP .~tmp cleanup, Azure rename_file.assert_called_once); SAW-010's S3 buffer mechanism is exercised but not mechanism-asserted (closest is the test that a stream failure mid-write leaves no truncated object).

Discovery follow-up

HTTP-CON-004 surfaced a three-way capability divergence not visible to the first pass: the runtime constant _CAPABILITIES (_http.py:41) declares {READ, METADATA, LAZY_READ}, while the ReadOnlyHttpBackend docstring (_http.py:215) and spec 032 both list only {READ, METADATA}. The code is internally inconsistent — LAZY_READ looks like an accidental addition — which makes the spec-wins resolution (principle 5) the obvious one. Resolve as part of, or before, writing the HTTP-CON-004 test.