Research: Backend Contract Completeness¶
Date: 2026-04-05 Status: Complete Scope: Gap analysis of the backend ABC as a behavioral contract. Identifies six categories of underspecified behavior where concrete backends diverge, maps each to historical bugs and spec gaps, and proposes spec amendments as a prerequisite for BK-139's conformance work. Related: BK-139, research-bug-prevention-beyond-testing.md, 003-backend-adapter-contract.md, 006-streaming-io.md, 007-atomic-writes.md, 037-depth-limited-listing.md.
Context: The bug-prevention research (BK-139) recommends property-based testing and extended conformance tests to catch cross-backend behavioral inconsistencies. But conformance tests can only verify behavior that is specified. This document asks the upstream question: is the backend ABC well-defined enough to serve as a complete behavioral contract, or are there gaps that must be closed in the specs before testing can be effective?
Finding: The ABC defines the structural contract (signatures, types, capabilities) well but leaves significant behavioral surface area unspecified. The 22 bugs fixed in 0.21.1 cluster in exactly these gaps. Six categories of underspecified behavior are identified below.
1. Methodology¶
For each abstract method in _backend.py, we compared:
- Spec language — the behavioral guarantees in
sdd/specs/003-*,006-*,007-*,036-*,037-* - ABC docstrings — any additional constraints stated in code
- Concrete implementations — Local, Memory, S3, SFTP, Azure, SQL
- Store assumptions — what
_store.pyrelies on that the ABC doesn't guarantee
A divergence is a case where two backends produce observably different results for the same input, and the spec does not designate either behavior as correct.
2. Gap Categories¶
Gap 1: Precondition Evaluation Order¶
What the spec says: BE-008 states write(overwrite=False) raises
AlreadyExists if the file exists. The spec does not address what happens
when the path names an existing directory, or which error takes precedence
when multiple preconditions fail simultaneously.
What backends do:
| Backend | Path-is-directory | Check timing |
|---|---|---|
Local (_local.py:149) |
InvalidPath (checked first) |
Check-then-act |
Memory (_memory.py:124) |
InvalidPath (checked first, under lock) |
Atomic |
S3 (_s3.py:142) |
AlreadyExists (no dir/file distinction) |
Check-then-act |
SFTP (_sftp.py:326) |
No explicit check | Check-then-act |
Azure (_azure.py:345) |
No explicit check (virtual dirs) | Check-then-act |
SQL (_sqlalchemy.py:384) |
No dir concept | Atomic (transaction) |
Divergence: Calling write("existing_dir", data, overwrite=False) raises
InvalidPath on Local/Memory but AlreadyExists on S3. The caller cannot
write portable error handling without knowing which backend is underneath.
Bugs caused: BUG-154 (Local write() leaking IsADirectoryError).
BUG-153 (Local read()/delete() leaking IsADirectoryError) is related
but belongs to Gap 2 — the read/delete path is an error-discrimination issue,
not a write-precondition issue.
What's missing from the spec:
- BE-008 needs: "If path names an existing directory, raises InvalidPath.
This check MUST precede the overwrite check."
- General rule: precondition checks MUST be evaluated in a defined order:
path validity → type conflict → overwrite conflict → I/O.
Gap 2: Error Discrimination Granularity¶
What the spec says: BE-021 requires all backend-native exceptions to be
mapped to RemoteStoreError subtypes. The spec does not define which
native conditions map to which subtype beyond the per-method Raises:
clauses.
What backends do:
| Scenario | Local | SFTP | S3 | Azure |
|---|---|---|---|---|
| Read a directory | InvalidPath |
NotFound (errno) |
NotFound (s3fs) |
NotFound (blob API) |
| Delete with permission error | PermissionDenied |
Swallowed as NotFound (BUG-145) |
PermissionDenied |
PermissionDenied |
| List with permission error | PermissionDenied |
Swallowed silently (BUG-146) | PermissionDenied |
Depends on HNS |
_ensure_parent_dirs permission |
PermissionDenied |
Swallowed silently (BUG-147) | N/A (no dirs) | N/A (no dirs) |
Divergence: The same failure mode produces different error types across
backends. SFTP's broad except OSError handlers without errno
discrimination turned permission errors into silent returns.
Bugs caused: BUG-145, BUG-146, BUG-147 (SFTP error swallowing);
BUG-153 (Local read()/delete() leaking IsADirectoryError instead of
mapping to NotFound).
What's missing from the spec: - BE-021 needs a canonical error mapping table per scenario:
| Scenario | Required error type |
|---|---|
| Read/write a path that is a directory | InvalidPath |
| Operation on non-existent file | NotFound |
| Operation denied by credentials | PermissionDenied |
| Parent creation fails (permissions) | PermissionDenied |
| Parent creation fails (path conflict) | InvalidPath |
- Rule: Broad exception handlers (
except OSError,except Exception) in backends MUST discriminate by error code or type before mapping. Silent returns are only permitted forexists(),is_file(),is_folder().
Gap 3: Listing Behavior on Non-Existent Paths¶
What the spec says: BE-014 defines list_files() postconditions (returns
only files, supports recursive) but does NOT specify behavior when path
doesn't exist. BE-026 (iter_children) explicitly says "non-existent paths
yield nothing" — but this guarantee is not stated for list_files() or
list_folders().
What backends do:
| Backend | list_files(missing_path) | list_folders(missing_path) |
|---|---|---|
Local (_local.py:272) |
Empty iterator (silent) | Empty iterator (silent) |
Memory (_memory.py:211) |
Empty iterator (silent) | Empty iterator (silent) |
S3 (_s3_base.py:~150) |
Empty iterator (S3 returns empty for missing prefixes) | Empty iterator |
SFTP (_sftp.py:~460) |
Empty iterator (catches ENOENT) | Empty iterator |
Azure (_azure.py:~510) |
Empty iterator: HNS catches NotFound and returns; non-HNS prefix query returns empty naturally |
Empty iterator |
SQL (_sqlalchemy.py:~455) |
Empty iterator (prefix query returns no rows) | Empty iterator |
Current state: All backends agree on the missing-path case (empty
iterator), but this is accidental consensus — the spec does not require it.
A new backend implementer could reasonably raise NotFound. Note: Azure HNS
does propagate non-NotFound errors differently from non-HNS (see Gap 2),
but for the missing-path case specifically, both code paths converge on empty.
Store safety net: The Store does NOT catch NotFound from listing
operations — it trusts the backend to return empty. If a backend raised
NotFound, callers would see an unhandled exception.
What's missing from the spec:
- BE-014 needs: "If path does not exist or is not a folder, the iterator
yields nothing. list_files() MUST NOT raise NotFound for missing
paths."
- Same clause needed for BE-015 (list_folders).
Gap 4: max_depth Counting Semantics¶
What the spec says: DEPTH-001 defines depth as "the number of folder levels between the listing root and the file's parent directory" with an example:
store.list_files("data", max_depth=1)
data/file_a.csv → depth 0 ✓ included
data/raw/file_b.csv → depth 1 ✓ included
data/raw/2026/file_c.csv → depth 2 ✗ excluded
The spec provides this example but does not specify a reference algorithm.
What backends do:
| Backend | Depth algorithm | Code location |
|---|---|---|
| Local | len(Path(dirpath).relative_to(full).parts) — path-part count |
_local.py:282 |
| Memory | Iterative DFS with depth counter on stack entries | _memory.py:540 |
| SFTP | Recursive _depth parameter incremented per level |
_sftp.py:490 |
| S3 | Breadth-first queue with depth counter | _s3_base.py:~155 |
| Azure | rel.count("/") — slash-counting |
_azure.py:524 |
| SQL | suffix.count("/") — slash-counting |
_sqlalchemy.py:476 |
Divergence: Azure and SQL count / characters in the relative path
suffix. Local/Memory/SFTP count directory traversal levels. These produce
identical results for well-formed paths but could diverge for edge cases
(trailing slashes, empty segments from double-slashes, etc.).
Comparison operator divergence:
| Backend | Condition | What it controls |
|---|---|---|
| Local | depth > max_depth → skip dir contents |
Files at max_depth included; dirs at max_depth not entered |
| Memory | depth >= max_depth → don't recurse into subdir |
Files at max_depth included (yielded before recursion check) |
| Azure | depth > max_depth → skip file |
Post-hoc filter on yielded files |
| SQL | depth > max_depth → skip file |
Post-hoc filter on yielded files |
| SFTP | depth >= max_depth → don't recurse into subdir |
Files at max_depth included (yielded before recursion check) |
Memory and SFTP use >= but apply the check to subdirectory recursion, not
to file yielding — files at exactly max_depth are still yielded. Local/Azure/SQL
use > but apply the check differently (traversal pruning vs post-hoc filter).
All backends produce the same observable results for well-formed paths. The
divergence is in implementation strategy, not in output — but the lack of a
reference algorithm means a new backend could misinterpret the semantics.
Store safety net: DEPTH-003 says the Store applies client-side depth
filtering as a correctness guarantee even if backends get it wrong
(_store.py:393-396). This masks backend bugs rather than preventing them.
Bugs caused: BUG-152 (S3 ignoring max_depth), BUG-155 (Azure ignoring max_depth). Note: both bugs were missing-implementation (the parameter was accepted but filtering logic was absent), not algorithm-disagreement bugs. A reference algorithm in the spec would have made these gaps obvious during implementation and review.
What's missing from the spec:
- DEPTH-001 needs a reference algorithm, not just an example:
"Depth is computed as the number of path segments between the listing root
and the file's parent, i.e., len(RemotePath(file.path).parent.parts) -
len(RemotePath(root).parts). max_depth=N includes files where
depth <= N."
- Explicitly state the comparison: inclusive (depth <= max_depth), not
exclusive.
Gap 5: Move/Copy Atomicity and Failure Modes¶
What the spec says: BE-018 defines move(src, dst) as "renames/moves a
file" with error conditions for missing source and existing destination. The
spec says nothing about atomicity, partial failure, or whether the operation
can leave data in both locations.
AW-006 defines atomicity for write_atomic on Local (POSIX os.replace),
but no spec covers move/copy atomicity per backend.
What backends do:
| Backend | move() implementation | Atomic? | Failure mode |
|---|---|---|---|
Local (_local.py:360) |
shutil.move() → os.rename |
Yes (same FS) | Fails cleanly |
Memory (_memory.py:369-370) |
Dict detach+attach under lock | Yes | Fails cleanly |
S3 (_s3.py:213-214) |
copy() then rm() |
No | Copy succeeds + rm fails = duplicate |
SFTP (_sftp.py:603-614) |
posix_rename() → rename() → copy+delete |
Best-effort cascade | May leave duplicate on fallback path |
Azure HNS (_azure.py:674-677) |
rename_file() |
Yes | Fails cleanly |
Azure non-HNS (_azure.py:679-681) |
start_copy_from_url() + delete_blob() |
No | Copy succeeds + delete fails = duplicate |
SQL (_sqlalchemy.py:617) |
UPDATE ... SET key=dst WHERE key=src |
Yes (transaction) | Fails cleanly |
Divergence: S3 and Azure non-HNS can leave files in both source and destination on partial failure. Memory, Local, and SQL cannot. This is an observable behavioral difference invisible from the ABC.
What's missing from the spec:
- BE-018 needs: "Backends SHOULD implement move() atomically where the
underlying storage supports it. If the backend uses a copy-then-delete
strategy, it MUST document this in its capability declaration. The caller
MUST NOT assume atomicity unless the backend declares it."
- Consider a Capability.ATOMIC_MOVE (parallel to ATOMIC_WRITE) so
callers can query whether move is safe under concurrent access.
- Same analysis applies to BE-019 (copy()) — though copy has no
delete-after phase, partial failure can still leave a corrupt destination.
Gap 6: Resource Lifecycle on Error Paths¶
What the spec says: SIO-001 says "the caller is responsible for consuming and closing the stream." SIO-004 says "if a read operation fails, no partial stream is returned — the error is raised before any data is returned." SIO-005 says "partially opened resources are cleaned up where possible."
The spec does not define what happens in the gap between acquiring the raw handle and returning the wrapped stream to the caller. This is the acquire-then-wrap phase.
What backends do:
# Pattern in all affected backends:
def read(self, path: str) -> BinaryIO:
raw = self._native_open(path) # ← raw handle acquired
return _ErrorMappingStream(raw, ...) # ← if __init__ raises, raw leaks
| Backend | Wrapping layers | Protected? |
|---|---|---|
| Local | Direct open() → no wrapping |
N/A (OS handles) |
| Memory | BytesIO from dict |
N/A (in-memory) |
S3 (_s3.py:130-134) |
s3fs handle → _ErrorMappingStream → BufferedReader |
No (BUG-159) |
SFTP (_sftp.py:~385) |
paramiko SFTPFile → _ErrorMappingStream |
Fixed (BUG-142) |
Azure (_azure.py:~310) |
Azure downloader → _ErrorMappingStream |
Fixed (BUG-158) |
Bugs caused: BUG-142, BUG-156, BUG-158, BUG-159 (4 resource leak bugs).
What's missing from the spec:
- SIO-001 needs: "Between acquiring a raw handle and returning the wrapped
stream, backends MUST ensure the raw handle is closed if wrapping fails.
The recommended pattern is _safe_wrap(raw, wrapper) which closes raw
on any exception from wrapper."
- This is not just a testing gap — it's a contract gap. The ABC docstring
for read() should state the resource safety invariant.
3. Combinatorial View¶
The backend ABC has ~18 abstract methods. Each can be called with various parameter combinations. Across 7 backends, the behavioral test space is:
Methods (18) × Parameter combos (~5 avg) × Backends (7) × State conditions (~3)
= ~1,890 behavioral test points
The current conformance suite covers ~490 of these (~26%). The gaps above identify the categories where the remaining 74% contains bugs.
A property-based stateful model (BK-139 P4) can explore this space combinatorially, but only if the oracle — the behavioral contract — defines what "correct" means for each point. Currently, ~30% of the behavioral surface has no specified correct answer.
4. Relationship to BK-139¶
BK-139 implements prevention strategies from the bug-prevention research. This contract analysis is upstream — it defines what the tests should verify.
| BK-139 deliverable | Depends on contract? | Which gap? |
|---|---|---|
_safe_wrap helper |
No (structural fix) | — |
| Hypothesis P4 (stateful model) | Yes — model needs oracle | Gaps 1–5 |
| Hypothesis P1–P3 | No (pure-function roundtrips) | — |
| Extended conformance (param combos) | Yes — expected values need spec | Gaps 1, 3, 4 |
| Extended conformance (error fidelity) | Yes — correct error type needs spec | Gap 2 |
| Extended conformance (resource cleanup) | Partially — pattern is clear | Gap 6 |
check_error_handling.py |
No (static pattern) | — |
Recommendation: Spec amendments for Gaps 1–5 should be completed before
or alongside BK-139's P4 and extended conformance work. Gap 6's amendment
can ship with the _safe_wrap deliverable.
5. Proposed Spec Amendments¶
| Spec | Gap | Amendment |
|---|---|---|
| BE-008 | 1 | Add precondition order: path validity → type conflict → overwrite |
| BE-021 | 2 | Add canonical error mapping table for cross-cutting scenarios |
| BE-014 | 3 | Add "MUST NOT raise NotFound for missing paths; yields nothing" |
| BE-015 | 3 | Same clause for list_folders() |
| DEPTH-001 | 4 | Add reference algorithm with RemotePath.parts counting and inclusive comparison |
| BE-018 | 5 | Document atomicity is backend-dependent; consider ATOMIC_MOVE capability |
| SIO-001 | 6 | Add acquire-then-wrap safety invariant |
These amendments tighten the contract without changing any existing correct behavior — they codify what all backends already do (or should do). Where a backend currently diverges from the tightened contract, that's a bug to fix.
6. Impact on Test Design¶
With tightened specs, the test design becomes straightforward:
For conformance tests (BK-139 extended suite):
- Gap 1: test_write_on_directory_raises_invalid_path (all backends)
- Gap 2: test_read_on_directory_raises_remote_store_error (all backends)
- Gap 3: test_list_files_missing_path_yields_nothing (all backends)
- Gap 4: test_list_files_max_depth_reference_algorithm (all backends)
- Gap 5: Document-only (no behavioral test, but capability query test)
- Gap 6: test_read_wrapper_failure_closes_raw_handle (per-backend with monkeypatch)
For property-based model (BK-139 P4):
- The model oracle can now define:
- write(dir_path) → InvalidPath (not AlreadyExists)
- list_files(missing) → empty (not NotFound)
- move() post-state: src gone, dst present (atomicity is backend flag)
- depth counting uses parts-based algorithm
Without these amendments, the P4 oracle would need per-backend exception tables, defeating the purpose of a unified behavioral contract.