Research: Bug Prevention Beyond Testing¶

Date: 2026-04-03 Status: Complete Scope: Systematic analysis of 0.21.1 bug root causes and evaluation of prevention strategies beyond testing (DbC, PBT, extended conformance, resource safety, static analysis). Related: BK-139, BUG-159, TESTING.md, research-testing-best-practices.md.

Context: The 0.21.1 patch release fixed 22 bugs despite strong SDD practices, spec-driven development, 95% coverage, mutation testing, and review-enforced test quality rules. This document analyzes root causes, evaluates prevention strategies beyond testing, and recommends concrete actions.

1. Bug Taxonomy (0.21.1)¶

Categorizing the 22 bugs fixed in 0.21.1 by root cause pattern reveals seven clusters:

Pattern	Bugs	Count
Cross-backend behavioral inconsistency	BUG-150, 151, 152, 155	4
Resource leak on error path	BUG-142, 144, 156, 158	4
Error swallowing / wrong error mapping	BUG-145, 146, 147	3
Edge-case inputs not rejected	BUG-136, 139, 140, 141	4
Cache coherency	BUG-137, 138	2
Mutation of caller data	BUG-148	1
Edge-case behavior on unexpected input	BUG-143, 153, 154, 157	4

Key observation: Most bugs are NOT logic errors in core algorithms. They are behavioral gaps at boundaries — parameter combinations the conformance suite didn't cover, error paths where native exceptions leaked or were swallowed, resource lifecycle edges, and input values nobody expected.

Testing alone cannot efficiently prevent these. The conformance suite tests each method's primary contract, but the bug surface is in the cross-product of methods, parameters, backends, and failure modes.

2. Strategies Evaluated¶

Six strategies were evaluated for fit with remote-store's design principles (zero runtime deps, SDD/spec-driven, mypy strict, minimal complexity).

2.1 Design-by-Contract (DbC)¶

Libraries evaluated: icontract (mature, ABC inheritance via DBCMeta), deal (static linter, weaker ABC story), dpcontracts/PyContracts (dead).

Verdict: Targeted assert only — no library.

Rationale: - Adding icontract breaks the zero-runtime-dependency guarantee. - The conformance suite already IS the DbC system — it tests pre/post conditions for every BE-xxx spec across all backends. - The 0.21.1 bugs were not caused by missing contract infrastructure; they were caused by missing test cases. - icontract's strongest feature (contract inheritance on ABC) overlaps entirely with what the parameterized conformance suite does. - No major Python storage library (fsspec, smart_open, SQLAlchemy, PyArrow) uses a DbC library. The industry pattern is: explicit validation for external input, assert for internal invariants, conformance tests for behavioral contracts.

What to do instead: - Bare assert for internal invariants in complex methods (e.g., after stream wrapping: assert not stream.closed). Stripped by python -O. - Explicit raise TypeError/ValueError for external input validation (config parsing, constructor args) — ordinary validation, not DbC.

2.2 Property-Based Testing (PBT)¶

Tool: Hypothesis (dev dependency only, zero runtime impact).

Verdict: Adopt — high value for 4 specific targets.

PBT excels where the input space is combinatorial and there is a clear oracle (roundtrip property, "no silent corruption" invariant, dict-model equivalence). The 0.21.1 parsing/config bugs (BUG-136, 139, 140, 141) are textbook PBT targets.

Priority targets:

#	Target	Property	Bugs caught
P1	Partition roundtrip	`parse(build(k,v)) == (k,v)`	BUG-141
P2	Config `from_dict`	Never silently corrupts (no `"None"` strings)	BUG-139, 140
P3	Path normalization	Idempotent; hostile input normalizes or raises	BUG-136
P4	Stateful backend model	MemoryBackend matches dict under random ops	BUG-152, 155

P1–P3 are pure functions — microsecond execution, ~80 lines total. P4 uses Hypothesis RuleBasedStateMachine — model the backend as a Python dict, generate random write/read/delete/list sequences, verify the real backend matches. ~60 lines. P4 is the highest-value target: it catches cross-backend behavioral inconsistency, the hardest bug class to find manually and the one that caused the most 0.21.1 bugs.

CI setup: Three profiles (dev=50 examples, ci=100, nightly=1000). Total CI impact: ~10 seconds. Add .hypothesis/ to .gitignore. Hypothesis must be strictly a dev dependency (in [tool.hatch.envs.default] dependencies, never in [project.dependencies]) to maintain the zero-runtime-dep guarantee.

Concrete example — would have caught BUG-141:

@given(
    keys=st.lists(partition_key, min_size=1, max_size=4, unique=True),
    values=st.lists(partition_value, min_size=1, max_size=4),
    filename=st.text(alphabet=..., min_size=1, max_size=30),
)
def test_partition_roundtrip(keys, values, filename):
    pairs = dict(zip(keys, values[:len(keys)]))
    path = partition_path(filename, **pairs)
    parsed = parse_partition(path)
    assert parsed.partitions == pairs
    assert parsed.filename == filename

Concrete example — would have caught BUG-139, BUG-140:

@given(data=config_strategy)
@settings(max_examples=200)
def test_config_from_dict_no_silent_corruption(data):
    try:
        rc = RegistryConfig.from_dict(data)
    except (TypeError, ValueError, KeyError):
        return  # valid rejection
    for bc in rc.backends.values():
        assert bc.type != "None"       # BUG-140
        assert isinstance(bc.options, dict)  # BUG-139

2.3 Cross-Backend Extended Conformance Suite¶

Verdict: Adopt — ~58 new tests in 6 categories.

The existing conformance suite tests each method's primary contract but not parameter combinations, edge-case inputs, error fidelity, metadata fields, or resource cleanup. Every 0.21.1 bug class maps to one of these gaps.

Structure: New file tests/backends/test_conformance_extended.py with @pytest.mark.extended_conformance. Reuses existing backend fixture — zero new infrastructure.

Category	Priority	Tests	Bugs caught
Parameter combinations	CRITICAL	~15	BUG-152, 155
Edge-case inputs	HIGH	~12	BUG-143, 153, 154
Error fidelity	HIGH	~10	BUG-145, 146, 147
Metadata consistency	MEDIUM	~8	BUG-150, 151
Resource cleanup	MEDIUM	~6	BUG-142, 144, 156, 158
Operational consistency	MEDIUM	~7	General regression

Key design decisions: - Edge-case tests assert on RemoteStoreError (base class), not specific subclasses. The contract is "no native exceptions leak." - ETag test uses a _BACKENDS_WITH_ETAG set documenting the per-backend contract. S3PyArrow would have been in the set and failed (BUG-150/151). - Resource cleanup tests verify the observable contract cross-backend; deep monkeypatch tests (wrapper failure injection) stay backend-specific.

CI impact risk: 58 tests x 7 backends = ~400 parameterized cases. Tests against mocked backends (moto, Azurite, in-memory SFTP) are fast, but watch for bloat. Use @pytest.mark.extended_conformance so these can be separated into integration/nightly CI if they slow the default run. Keep the extended suite focused — test dangerous parameter combinations, not exhaustive permutations.

Concrete example — would have caught BUG-152, BUG-155:

@pytest.mark.parametrize(
    ("recursive", "max_depth", "expected_names"),
    [
        pytest.param(True, 0, {"a.txt"}, id="depth0"),
        pytest.param(True, 1, {"a.txt", "b.txt", "c.txt"}, id="depth1"),
        pytest.param(True, 2, {"a.txt", "b.txt", "c.txt", "d.txt"}, id="depth2"),
        pytest.param(True, None, {"a.txt", "b.txt", "c.txt", "d.txt", "e.txt"}, id="unlimited"),
    ],
)
def test_list_files_recursive_max_depth(self, backend, recursive, max_depth, expected_names):
    _require(backend, Capability.LIST, Capability.WRITE)
    _seed(backend, self.DEPTH_TREE)
    files = list(backend.list_files("pc", recursive=recursive, max_depth=max_depth))
    assert {f.name for f in files} == expected_names

Concrete example — would have caught BUG-153, BUG-154:

def test_read_on_directory(self, backend):
    _require(backend, Capability.WRITE)
    backend.write("dir_r/file.txt", b"x")
    with pytest.raises(RemoteStoreError):
        backend.read("dir_r")

2.4 Resource Safety Patterns¶

Verdict: Adopt _safe_wrap helper — also found a latent bug.

The 4 resource-leak bugs (BUG-142, 144, 156, 158) share a single pattern: backend read() acquires a low-level resource (file handle, paramiko channel, Azure downloader), then wraps it in _ErrorMappingStream / BufferedReader. If wrapping fails, the raw resource leaks.

Latent bug found during research (filed as BUG-159): Both S3 backends have unprotected acquire-then-wrap in read(). S3Backend.read() (_s3.py:130-134) wraps an s3fs handle in _ErrorMappingStream → BufferedReader (double-layer, same as BUG-142/158). S3PyArrowBackend (_s3_pyarrow.py:196-200) wraps a PyArrow NativeFile in _PyArrowBinaryIO → _ErrorMappingStream (single-layer, different resource type). If any wrapping constructor raises, the raw handle leaks.

Recommended deliverables:

Action	Value	Effort
`_safe_wrap()` helper in `_stream.py`	HIGH	~10 lines
Fix S3Backend.read() using the helper	BUG FIX	Uses helper
`ResourceWarning` in `__del__` for SFTP/Azure backends	LOW-MED	~5 lines each

The helper:

def _safe_wrap(raw: BinaryIO, wrapper: Callable[[BinaryIO], BinaryIO]) -> BinaryIO:
    """Wrap a stream, closing the raw handle if wrapping fails."""
    try:
        return wrapper(raw)
    except BaseException:
        with contextlib.suppress(Exception):
            raw.close()
        raise

What was ruled out: - ExitStack in __init__ — over-engineering for lazy properties. "Store as self._X, close in close()" is what every comparable library does. - ExitStack in read() — heavier than _safe_wrap for the common case. - Async backends — different architecture (async generators). No equivalent leak pattern exists. - Linter rules — no ruff/pylint rule can detect acquire-then-wrap without protection. The helper eliminates the pattern structurally.

2.5 Static Analysis for Error Handling¶

Verdict: One custom AST script + enable ruff BLE rules.

The SFTP error-swallowing bugs (BUG-145, 146, 147) are semantic — ruff can flag except Exception: but not except IOError: that should check errno. This requires a custom AST checker.

Ruff gap analysis:

Rule	Enabled?	Catches SFTP bugs?
`E722` (bare `except:`)	Yes	No
`BLE001` (`except Exception`)	No	No — only flags `Exception`/`BaseException`, not `IOError`
`TRY*` (tryceratops)	Not enabled	No — TRY rules cover raise style, logging, else clauses. None flag silent returns.
`PGH*` (pygrep-hooks)	Not enabled	No — PGH rules cover noqa/type:ignore annotations, not error handling.
`S110` (try/except/pass)	Not enabled	Partially — catches `pass` but not `return False` or `return []`

Deliverable 1 — scripts/check_error_handling.py (~80 lines):

Scans src/remote_store/backends/*.py. Flags except handlers where: - The caught type is broad (IOError, OSError, Exception, BaseException) - The body silently returns (pass, return [], return False, return None) - The handler does NOT check errno

Supports # noqa: RSE001 suppression for intentional broad catches (~3–5 in current code: exists(), is_file(), is_folder()).

Would have caught BUG-145, BUG-146, BUG-147 directly.

Maintenance risk: Custom AST scripts can become brittle as code patterns evolve. The two existing scripts (check_test_assertions.py, check_mock_spec.py) have been stable because they target narrow, well-defined patterns. This script targets a similarly narrow pattern (broad catch + silent return + no errno). Keep scope minimal — if it grows beyond ~100 lines or requires frequent suppression updates, reconsider in favor of review-enforced rules.

Deliverable 2 — Enable ruff BLE rule set:

One-line change in pyproject.toml. Flags except Exception without re-raise. May need ~2–3 # noqa: BLE001 suppressions.

What was ruled out: - Pylint — ruff + custom scripts cover the gap better. - Full native-exception-leak detection via AST — requires call-graph analysis. False-positive rate too high on LocalBackend. - Error message format checking — low bug density.

2.6 Strategies Not Pursued¶

Strategy	Reason
Fuzzing (AFL, python-afl)	Redundant with Hypothesis (§ 2.2)
Immutability enforcement	Narrow scope (1 bug). Fold into PBT mutation checks.
Type-level guarantees	Already at mypy strict. Incremental gains don't justify effort.

3. Coverage Matrix¶

How each strategy maps to the 0.21.1 bug clusters:

Bug cluster	PBT	Extended conformance	Resource safety	Static analysis
Cross-backend inconsistency	P4 (stateful)	Parameter combos	—	—
Resource leak on error path	—	Resource cleanup	`_safe_wrap`	—
Error swallowing	—	Error fidelity	—	`check_error_handling.py`
Edge-case inputs	P1–P3	Edge-case matrix	—	—
Cache coherency	P4 (stateful)	Operational consistency	—	—
Mutation of caller data	P2 (config)	—	—	—

All 22 bugs are covered by at least one strategy. Most are covered by two (defense in depth).

4. Implementation Priority¶

Ordered by value-to-effort ratio:

#	Deliverable	Effort	Prevents	Risk
1	`_safe_wrap` helper + S3 bug fix (BUG-159)	~20 lines	Resource leaks (4 bugs) + latent S3 bug	Very low
2	Hypothesis P4 (stateful backend model)	~60 lines	Cross-backend + emergent bugs	Low (flaky if non-deterministic — use fixed seeds in CI)
3	Hypothesis P1–P3 (partition, config, path)	~80 lines	Parsing/config bugs (4 bugs)	Low
4	Enable ruff `BLE` rules	1-line config	Broad-except class	Very low
5	Extended conformance suite (6 categories)	~300 lines	All 22 bug classes	Medium (CI duration — use marks to separate)
6	`check_error_handling.py` AST script	~80 lines	Error swallowing (3 bugs)	Medium (maintenance burden — keep scope minimal)
7	`ResourceWarning` safety net for SFTP/Azure	~10 lines	Debugging aid	Very low

Items 1–4 are the highest-ROI investments (resource safety + PBT). Item 5 has medium ROI but needs CI-time monitoring. Item 6 is deferred until items 1–5 prove insufficient — the extended conformance error fidelity tests (item 5, category 3) may cover the same bug class with less maintenance overhead. Item 7 is a nice-to-have.

5. Relationship to Existing Infrastructure¶

Existing	New strategy	Relationship
Conformance suite (`test_conformance.py`)	Extended conformance	Complement — same fixture, new edge cases
Conformance suite	PBT stateful model	Complement — conformance tests known scenarios, PBT explores random sequences
`check_test_assertions.py`	`check_error_handling.py`	Same pattern — AST script in CI lint job
`check_mock_spec.py`	`check_error_handling.py`	Same pattern
ruff rules	Enable `BLE`	Extension of existing config
`_ErrorMappingStream`	`_safe_wrap`	Same module (`_stream.py`)
Mutation testing (pytest-gremlins)	PBT	Complement — mutation tests kill mutants in existing code, PBT generates new inputs
mypy strict	(no change)	PBT and conformance tests catch what types cannot