Changelog¶
All notable changes to this project will be documented in this file.
This project follows Semantic Versioning. Pre-1.0, minor bumps may contain breaking changes.
[Unreleased]¶
[0.27.0] - 2026-06-02¶
Added¶
-
RemotePath.as_posix()(ID-196):RemotePathnow offersas_posix(), returning the forward-slash key string (identical tostr(path)), so pathlib muscle memory works instead of raisingAttributeError. It is a method, not a property, matchingpathlib.PurePath.RemotePathremains deliberately notos.PathLike: it is a remote-store key, not a local path, soos.fspath()still raises, which keeps keys from silently slipping intoopen()oros.path.*and targeting the local filesystem. -
Opt-in write-under-file-ancestor rejection for flat-namespace backends (ID-211):
S3Backend,S3PyArrowBackend,AzureBackend(non-HNS),SQLBlobBackend, and asyncAsyncAzureBackendgain areject_write_under_file_ancestor: bool = Falseconstructor option. When enabled,write/write_atomic/open_atomic/move/copyreject a path that descends through an existing file withInvalidPath, matching the hierarchical backends' native behaviour. It is off by default because detection costs one HEAD request per slash-aligned ancestor (measured ~+9–19 ms at depth 6 on S3-moto / Azurite); a path with no slash skips the check entirely.
Changed¶
-
to_key()of a bare backend root now returns the empty key""on every backend (BK-234):S3Backend,S3PyArrowBackend, andAzureBackend(sync + async) previously returned the bare bucket/container unchanged, whileLocalBackend/SFTPBackendalready mapped it to"". The round-tripto_key(native_path(k)) == know holds for the empty key on all backends, not only for non-empty keys. -
The s3fs S3 lanes (
s3,s3-pyarrow) default touse_listings_cache=False(BK-257): fresh directory listings are now the default. s3fs'sDirCachenever expires, so a cached listing was permanently blind to writes from other clients (100% silent cross-writer staleness); the fresh-list cost is one bounded round trip. Re-enable the cache explicitly withclient_options={"use_listings_cache": True}, or useext.cachefor caching with explicit invalidation. -
LocalBackendandSFTPBackendraiseInvalidPathfor a write/move/copy whose path descends through an existing file (ID-209): previously they leaked native exceptions (FileExistsError,NotADirectoryError, SFTPENOTDIR) for this case (read/delete under a file ancestor map toNotFound). The cross-backend file-ancestor contract is now backed by aValid()class invariant in the Dafny model and a conformance gate certified through the compiled oracle.
Fixed¶
-
S3
write_atomicno longer commits a truncated object when the content source fails mid-stream (BUG-214): both S3 backends previously left a complete-looking but truncated object on a mid-stream content failure, breaking theATOMIC_WRITEcontract.S3Backend(s3fs) nowdiscard()s the in-flight upload (aborting any multipart upload), andS3PyArrowBackendbuffers the content fully before opening the upload. Plainwriteremains non-atomic (it may leave a partial object on failure, like the local backend). Confirmed against real AWS S3 for both backends. -
Azure HNS operations under a file ancestor raise the correct cross-backend error class (ID-213): on a real ADLS Gen2 (HNS) account,
write/write_atomic/open_atomic/move/copy/delete/list_*under a file ancestor surfacedNotFound/AlreadyExists(the raw Azure SDK mapping) instead of the cross-backendInvalidPath/NotFound/ empty-listing. A per-method ancestor probe (mirroring SFTP) now returns the contract-correct class;classify_azure_erroris unchanged. Applied to sync and async Azure. -
SFTP file-ancestor detection on chrooted / partial-permission servers (ID-212):
SFTPBackendwalked the parent chain from the absolute SFTP root, stat-ing components above itsbase_path. On a server where an ancestor above the chroot returnsSSH_FX_PERMISSION_DENIED, a genuine file-ancestor read was misclassified as a generic failure instead ofNotFound, and nested writes failed withPermissionDenied. Both helpers now walk frombase_pathonly, so the restricted ancestors are never probed.
Documentation¶
-
custom-backend-guide.mdconformance-suite references updated for the per-topic test layout (ID-214): the guide's two test tables linked to the deleted flattest_conformance.py/test_conformance_extended.py. They now list the eight per-topic files undertests/backends/conformance/, explain the@pytest.mark.extended_conformancemarker, and link the async sibling. In-tree test docstrings naming moved files were swept in the same pass. -
Internal tracker IDs stripped from published docstrings and
docs-src/(BK-246): backlog / spec / ADR / RFC coordinates (e.g. "See spec 003 § BE-008 and ID-211") were leaking into the rendered API reference and the docs site. 178 references across 24 files were rewritten as behaviour-first prose, and a newcheck_no_tracker_refslint gate (wired intohatch run lintand CI) prevents regressions.
Internal¶
- Spec-traceability correctness gaps closed (BK-250): added the five tests
audit-015 flagged as untested shipped behaviour (S3 / S3-PyArrow non-recursive
delete_folderon a non-empty folder raisesDirectoryNotEmpty;ReadOnlyHttpBackendname and capability set; anext.otelspan over theopen_atomiclifecycle; sequential-batch input-order preservation). The flagged HTTP capability divergence resolved toward the code:read()returns a live streamed body on every transport (urllib /requests/httpx), so theLAZY_READflag is truthful; spec 032 (HTTP-CON-004) and theReadOnlyHttpBackenddocstring were corrected to document it. The duplicateSTORE-015ID was renumbered (glob()→STORE-018).
[0.26.0] - 2026-05-25¶
Added¶
RemoteStoreComputeLogManager— Dagster compute log manager (ID-208, RFC-0014, DAG-021 -- DAG-033):ext.dagsternow covers Dagster's second storage extension point.RemoteStoreComputeLogManageris a Dagster instance component, configured indagster.yaml, that captures op/stepstdout/stderrand persists it to any remote-store backend — complementing the existing IO manager. It subclasses Dagster'sTruncatingCloudStorageComputeLogManagerand builds its ownStorefrombackend_type+backend_optionsvia the shared_build_store, which nowSecret-wraps credential-named options (DAG-033). The credential masking applies retroactively to the v2DagsterStoreResourceandRemoteStoreIOManager. Verified against the installeddagster1.13.5; the RFC's assumed import paths were corrected. Install viapip install "remote-store[dagster]".
Fixed¶
- Broken
docs.remotestore.devlinks in the README and the data-lake patterns guide (BK-236): two README links and one guide link pointed at docs-site paths that never existed. A newcheck_docs_site_linkslint gate (DOCFRAME-009) now resolves everyhttps://docs.remotestore.dev/stable|latest/<path>link against the page set derived frombuild_source_map— the same source→docs-URL map the mkdocs bridge uses — offline, with no docs build and no HTTP request. The new gate surfaced three additional broken links (a research doc plus the guide above) and an unregistered example (dagster_compute_log_managermissing fromexamples/_categories.yml, an ID-208 ripple miss), all fixed in the same PR.
Documentation¶
docs-src/reference/tested-versions.md(ID-182): new user-facing page recording the upper-bound transitive versions CI was last green against per[<extra>]. Generated frominfra/drift-locks/bydrift-check render-docs; refreshed in lockstep with the scheduled drift-guard run.FEATURES.md§ Install extras and the README link to it.
Internal¶
- Formal Verification wave — Dafny as the spec-test interlock (ID-190,
ID-206, BK-196, BK-232, BK-195, BK-233, BK-231):
WellFormedPathpredicate in the Dafny contract (ID-190, PATH-002 -- PATH-008, NPR-020): paths are no longer opaque non-empty strings; aghost predicate WellFormedPathcharacterising a normalised path is arequiresprecondition on all 13 contract methods.NativePathRoundTripproves NPR-020'sto_key(native_path(k)) == kidentity for non-empty keys; the empty-key round-trip is backend-divergent and tracked as BK-234. Ghost-only — the compiled oracle is unchanged.- Mechanical spec ↔ Dafny ↔ test traceability gate (ID-206):
scripts/check_formal_trace.pybuilds a coverage matrix across spec IDs with// @spectags insdd/formal/*.dfy,@pytest.mark.specconformance markers, andsdd/specs/IDs; fails on Dafny-backed clauses with no test, tests citing absent IDs, or tags citing absent IDs. Dual-wired intohatch run lintand the CI lint job, behind a baseline of five known gaps that must shrink, never grow. - Metadata pinned in the Dafny
CopyandMovepostconditions (BK-196, BK-232, WR-013, BE-018, BE-019): both postconditions now pinfs[dst].info.metadata == old(fs)[src].info.metadata, closing the (C) gap that letMemoryBackend.dfyverify cleanly while encoding a metadata-losing copy/move. copy()/move()user-metadata conformance tests (BK-195, BK-233, WR-013, BE-018, BE-019, ASYNC-018, ASYNC-019):test_metadata_round_trips_through_move_copy(sync + async) writes a file with non-empty metadata, copies/moves it, and assertsget_file_info(dst)returns the mapping verbatim; gated by the compiled Dafny oracle, parametrised over the backend registry, self-skipping backends withoutUSER_METADATA.sdd/formal/README.mdpath corrections (BK-231): oracle adapter and conformance-suite paths refreshed for thetests/backends/dafny/andtests/backends/conformance/layout.
- Scheduled CI drift guard for unbounded extra-dependency floors (ID-182):
.github/workflows/drift-guard.ymlruns Mondays 07:00 UTC, re-resolving eachremote-store[<extra>]withpip install --upgrade --pre, diffing againstinfra/drift-locks/<extra>.txt, and running smoke targets fromscripts/drift_smoke_map.pyfor any drifted extra. A single rolling GitHub issue is created / updated / auto-closed byscripts/drift_report.py. Pre-release resolutions surface in a distinct section so RCs do not look like stable drift. The workflow never editspyproject.tomland never opens a pin-update PR — early warning, not automated remediation. benchmarks/infra/→ top-levelinfra/(ID-204): the compose stack is consumed primarily by the test suite (sftp_docker / azurite conformance fixtures,test-cov-strict, everytests/e2e/*module); the oldbenchmarks/path misled contributors. MinIO host ports moved off the VSCode Jupyter scan band (9000/9001→19100/19101; container internal port unchanged).infra/.envis now the single source of truth for local-infra ports / hosts / credentials, exposed viainfra/_settings.py(stdlib only);scripts/check_infra_settings.pyfails lint on any literal-p N:Moutsideinfra/.env, dual-wired intohatch run lintand the CI lint job.- Live HNS suites trimmed to HNS-unique cases; async conformance gaps
closed (BK-182, BK-228, BK-229): after the Stage 3 cassette / replay
infrastructure landed in v0.25.0, the per-backend live HNS suites
duplicated happy-path coverage already exercised by conformance (sync 31
→ 13 cases, async 33 → 12). The BK-182 inventory surfaced two async
conformance gaps the deleted duplicates had been masking —
iter_childrenandwrite_atomichad no async happy-path coverage intest_async_extended.py— both closed in the same PR. - CI annotation silencing (BK-230): nested Node 20 deprecation warnings
in
verify-formal(fromdafny-lang/setup-dafny-action@v1.9.1's internal pins) silenced viaFORCE_JAVASCRIPT_ACTIONS_TO_NODE24at the job level; uv cache reservation race betweentest-primaryande2eresolved via per-jobcache-suffix. Drift-guard artifact actions bumped to v7/v8 (Node 24).
[0.25.0] - 2026-05-18¶
Added¶
Store.list_folders(pattern=…)andAsyncStore.list_folders(pattern=…)(ID-178, STORE-014, STORE-017, DEPTH-002): glob string matched against each folder's basename viafnmatch.fnmatch. Mirrorslist_files(pattern=…); composes withmax_depth=(BFS traversal runs first, pattern filters what is yielded). No backend changes required.SFTPUtilshost-key and algorithm preflight helpers:SFTPUtils.scan_host_keys(host, port=22) -> str(BK-199) — preflight host-key discovery for STRICT-policy callers.SFTPUtils.scan_host_algorithms(host, port=22)(BK-200) — raw-socket SSH KEXINIT probe for diagnosingIncompatiblePeerfailures.SFTPUtils.enable_ssh_rsa_compat()(BK-198) — paramiko 5+ legacy-server (ssh-rsa/ SHA-1) compatibility shim.HostKeyPolicy(...)accepts enum-name aliases case-insensitively (BK-197).
- Azure HNS account setup guide (
docs-src/guides/backends/azure-hns-setup.md): step-by-stepazCLI recipe for provisioning an ADLS Gen2 account suitable for the live HNS test suite; cross-linked from the Azure backend guide andCONTRIBUTING.md.
Fixed¶
Azure HNS correctness on real ADLS Gen2¶
A coordinated set of fixes against real Hierarchical Namespace accounts, surfaced by the new Stage 3 live HNS test suite. Azurite forgave each of these; real HNS rejected them or silently corrupted state. Sync and async siblings are kept in lockstep throughout.
- File-API data loss (data-loss fix) (BUG-197, BE-006, BE-007, BE-012, ASYNC-006,
ASYNC-007, ASYNC-012, BE-021):
AzureBackendandAsyncAzureBackendread,read_bytes,read_seekable, anddeletenow probehdi_isfolderbefore invoking the SDK — they raiseInvalidPathinstead of silently returningb""or destroying the directory marker. Thedeleteregression was a data-loss defect: a file-APIdelete()on what the caller believed was a file but was actually an HNS directory destroyed account state without surfacing an error. Perf note: syncread()anddelete()on HNS each add one HEAD round-trip per call; asyncread_bytes/deletereuse the same SDK response so they pay no extra RTT. write_atomicstreaming-input path (BUG-194, BUG-202, BE-010, WR-001a): bothAzureBackend.write_atomicandAsyncAzureBackend.write_atomicstreaming paths now drive the DataLake DFS append protocol directly (create_file→ per-chunkappend_data(offset, length)→flush_data(position)) instead ofupload_datawith an unseekable wrapper. Closes theMissingRequiredQueryParametererror. Memory is bounded to_AZURE_BLOCK_SIZEper chunk.- Directory-vs-file error fidelity across the public surface:
get_file_info(BUG-195, BE-016, ASYNC-016) raisesInvalidPath(notNotFound) for a directory path.is_folder(BUG-203, BE-005, ASYNC-005) returnsFalse(notTrue) for an HNS file path; both branches now inspecthdi_isfolderto distinguish a directory marker from a regular file.get_folder_info(BUG-198, BUG-199, BE-017, ASYNC-017) raisesInvalidPathfor a file path; recursivefile_countno longer counts HNS directory markers as files.delete_folder(BUG-198, BE-014, ASYNC-013) raisesInvalidPath(notDirectoryNotEmpty/NotFound) for a file path.move/copy(BUG-200, BE-018, BE-019, ASYNC-018, ASYNC-019) raisesInvalidPath(notRemoteStoreError(InvalidInput)/AlreadyExists) when the source or destination is an HNS directory.open_atomic(BUG-192, BE-021) raisesInvalidPath(notAlreadyExists) when the target is an HNS directory; bothoverwrite=Falseandoverwrite=Truecovered.write/write_atomic(BUG-190, BE-008, BE-010, ASYNC-008, ASYNC-010, ASYNC-024) raisesInvalidPath(notAlreadyExists) when the target is an HNS directory.
get_folder_info("")root path (BUG-213, BE-017, ASYNC-017): no longer fails with"Please specify a file system name and file path"— the HNS branch now skips the per-pathget_directory_clientprobe whenazure_path == "".move(p, p)/copy(p, p)self-op (BUG-201, BE-018, BE-019, ASYNC-018, ASYNC-019): both sync and async siblings short-circuit as a no-op (previously both raisedAlreadyExists); aget_blob_properties()precheck preservesNotFoundfor a missing source, and anhdi_isfoldercheck preservesInvalidPathfor an HNS directory path.Store.move/copyandAsyncStore.move/copyself-op error type (BK-227, BE-018, BE-019, BE-021): now raiseInvalidPath(wasNotFound) when the source path is a directory andsrc == dst. The short-circuit probesis_filefirst (1 RTT for the file no-op case), thenis_folderto distinguish a directory source from a missing source. Surfaces the HNS-correctness fixes above through the Store layer.- Async
write_atomicpost-rename quirk (BUG-196, WR-001a, WR-004, AZ-034):AsyncAzureBackend.write_atomictolerates a post-renameget_file_properties()failure by returningWriteResult(etag=None, last_modified=None)and logging a warning — mirrors the sync sibling's BUG-173 pattern.
Other¶
SFTPBackend.exists()/is_file()/is_folder()error swallowing (BUG-211): no longer treat non-ENOENTOSErrors as "not found"; connect-timePermissionError(and any other unexpectedOSError) now surfaces through_errors()asPermissionDenied/BackendUnavailableinstead of returning a misleadingFalse.SFTPBackendinlineknown_host_keyson Windows (BUG-209): the helper usedtempfile.NamedTemporaryFile(delete=True), whose WindowsO_TEMPORARYlock prevented paramiko from re-opening the file, silently bypassingSTRICThost-key verification.S3Backend.check_health()silent no-op (BUG-208): unawaitedaiobotocorecoroutine made the probe a silent no-op.AsyncMemoryBackenderror fidelity for type-mismatched paths (BUG-189, ASYNC-006, ASYNC-007, ASYNC-013, ASYNC-016, ASYNC-017, ASYNC-018, ASYNC-019):read,read_bytes, andget_file_inforaiseInvalidPath(notNotFound) when the path names an existing directory;get_folder_inforaisesInvalidPathwhen the path names an existing file;delete_folderraisesInvalidPathwhen the path is a file (regardless ofmissing_ok);move/copyraiseInvalidPathwhen the source is a directory;copy(src, src, overwrite=False)is a no-op instead of raisingAlreadyExists. Matches syncMemoryBackend;AsyncBackendABC docstrings updated accordingly.MemoryBackend.copy()andAsyncMemoryBackend.copy()drop user metadata (BK-192, BE-019, ASYNC-019, WR-013): metadata now preserved on the destination — fixes a silent metadata drop onwrite → copy → get_file_info.AsyncMemoryBackendmetadata round-tripping (BK-176, ASYNC-016, WR-013): metadata now preserved throughget_file_info,list_files(recursive and non-recursive), anditer_children— syncMemoryBackendparity.- Benchmark SVG images broken on the performance docs page (BUG-188).
- EthicalAds ad floating over the API graph viz canvas on RTD (BUG-187).
- API graph visualization blank on iOS Safari (BUG-186).
Changed¶
[sftp]extra now requiresparamiko>=3.0(BUG-204): for thechannel_timeout=connect kwarg used bySFTPBackend. paramiko 2.x floor lifted. Migration: environments pinningparamiko<3must upgrade.pyarrowcap lifted to<25across all extras (BK-168, BK-172):s3-pyarrowand related extras now allow pyarrow 24.x; S3-PyArrow conformance tests are routed to MinIO on pyarrow ≥ 24;moto[server,s3]>=5.2.0required for multipart compatibility under pyarrow 23.hatch run test-covno longer enforces--cov-fail-under=95: the strict gate moved to a newhatch run test-cov-strictscript. Release publishing and CI run the strict variant; localtest-covis now a coverage report only.
Documentation¶
aio.mdleads withAsyncStore(ID-192): full per-category method sections mirroringstore.md. The fourmembers: falsestubs (SyncBackendAdapter,AsyncBackendSyncAdapter,AsyncMemoryBackend,AsyncAzureBackend) now render their full member surface, surfacing the layer-4Raises:docstrings introduced by BK-173.- Async docstring ripple completed (BK-173, BK-174): nine I/O methods on
SyncBackendAdaptergainRaises:clauses mirrored from theAsyncBackendABC;InvalidPathdocumented on asyncwrite/write_atomicacross theAsyncBackendABC,SyncBackendAdapter, andAsyncMemoryBackend.write_atomic. Surfaces inhelp()and IDE hover. SFTPUtilsrendering (BK-202): helpers documented as true@staticmethod— correctmethrendering, signatures restored on docs.remotestore.dev.- RST cross-reference roles in audit-013-touched files corrected (BK-178).
Internal¶
- Stage 3 live cloud test infrastructure (BK-175, BK-179, BK-180, BK-181, BK-184,
BK-191, BK-204, BUG-182, BUG-191, BUG-193, BUG-210, BUG-212):
- Spec 048 Phase 1: fixture registry + conformance reorganisation (BK-179).
- Spec 048 Phase 2:
azure_liveandazure_live_asyncStage 3 conformance fixtures withBackendFixture.acloseasync cleanup channel (BK-180). - Spec 048 Phase 3: HTTP cassette/replay layer for Stage 2 Azure coverage (BK-181).
s3_liveStage 3 conformance fixture (BK-184).- SFTP-007 host-key resolution chain (config / env / STRICT-file tiers) coverage (BK-204).
- Live ADLS Gen2 integration test classes for
write_atomicmetadata survival (BUG-182) andwrite/write_atomic/open_atomicdirectory-path guards (BUG-191); async-side gap closed and sync HNS live tests gainedWriteResultassertions (BUG-193). azure_replayfixture missingcleanup=caused ~133 phantomUnclosed AzureBackendwarnings per Stage-2 run — fixed (BUG-210).scripts/record_cassettes.pyno longer deletes cassettes before validating env (BUG-212).- Stage 3 cassettes refreshed after PR #650; empty
_AZURE_HNS_KNOWN_FAILURE_FN_NAMESroster (BK-224).
- Physical fixture/backend registry as single source of truth (BK-185, BK-186): per-fixture flat-namespace and self-op flags replace the identity-keyed sets, eliminating drift.
tests/root cleanup (BK-188, BK-189, BK-190, BK-191, BK-215–BK-222): backend- specific evictions and seekable rename (BK-188);tests/ext/package + ext-module moves (BK-189); placement checks (rules S, B, E) +TESTING.mdand spec 048 update (BK-190); the_BACKEND_AT_ROOT_GRANDFATHEREDallow-list audit completed in six slices reshapingtest_config,test_ping,test_depth_listing,test_seekable,test_pbt_write_result,test_coverage_gapsper-backend;test_examples.pyallow-list justification documented (BK-215).- End-to-end S3 control-path coverage (BK-166, S3-026, S3PA-026):
tests/backends/test_s3_moto.pydrives the fullwrite/list_files/read/deletelifecycle for bothS3BackendandS3PyArrowBackendagainst aThreadedMotoServerwith the tunedclient_optionsshape that triggered BUG-178 and BUG-185. Runs in the default suite so a regression in theconfig_kwargsrouting surfaces immediately. hatch run allperformance (ID-195): pytest-xdist, preflight, and SFTP-Docker carve-out applied; pre-commit gate stays fast.- Per-topic
mutate-conformance-*scopes (BK-183): Windows-compatible mutation- testing topic scopes. - Self-op test parametrization + tighter
match=regexes (BK-177, BK-223). - Documentation framework tooling (BK-167, BK-167a, BK-167b, BK-169, BK-170, BK-171,
BK-205, ID-175):
- ADR-0027, Spec 047 (docs framework tooling contracts), and
sdd/AUTHORING.mddefine the framework (BK-167a). - Docs-framework bridge:
scan_dual_files+render_dual_pages,explanation/design/URL alignment, nav restructure,--strictCI gate restored, audit-012 closed (BK-167b). - Universal on-disk link rule (DOCFRAME-008):
mkdocs_hooks.pyappliesLinkResolverto everydocs-src/file at build time;check-linkscollapses to a single mode that walks every git-tracked.md; SDD kind rules hoisted todocs-src/_path_rules.yml(BK-171). - Five spec-traced pytest tests for the DOCFRAME-004 gate (BK-169).
- API graph visualization hosted in the docs Explanation section (BK-170).
- Authoring templates folder at
sdd/templates/(ID-175). check_rst_rolesandcheck_docs_frameworkwired into the CI lint job (BK-205).- Docs structure audit for the post-ID-174 layout (BK-165).
- ADR-0027, Spec 047 (docs framework tooling contracts), and
- Coalesced
azure.core.exceptionsimports across the Azure backend (BK-226). - CI / build hygiene: Node.js 20 → 24 audit closed as no bump needed (BK-206);
non-package tests scoped to Python 3.13 in the CI matrix (BK-207); CI Python version
centralised + primary-Python jobs split (BK-219); lint/format/typecheck scope expanded
to
scripts/andexamples/(BK-187); gen-checks dual-wired intohatch run lint(BK-203); mutation-testing matrix shard pytest venv fix (BUG-207); mutation-testing scheduled cron setup-job fix (BUG-206). - SFTP test-hygiene: TESTING.md Rule 3 violations on
SFTPBackendprivate state removed (BK-201). docs-src/context7.json(ID-176): claimshttps://docs.remotestore.dev/stable/on context7 and supplies the fullrulesarray.- Long-term docstring style enforcement design (ID-177).
- Ripple-check rewrite (BK-194, BK-193): compact pre-work index + detailed verify
checklist; trace schema gains
audiencefield and post-hoc fields; unreleased traces re-tagged.
[0.24.1] - 2026-04-30¶
Added¶
CAPABILITIES: ClassVar[CapabilitySet]on every backend and ABC (ID-159, BE-003):Backend,AsyncBackend, all built-in backends, andSyncBackendAdapternow declare a class-levelCAPABILITIESattribute exposing the capability set without requiring instantiation. Thecapabilitiesproperty delegates toself.CAPABILITIESso the class view and the instance view always agree. Conformance tests enforceinstance.capabilities <= Cls.CAPABILITIESfor every backend; forSQLBlobBackend,CAPABILITIESis the upper bound and narrow-column schemas may yield a strictly smaller instance set. Custom backends should follow the same pattern (seedocs-src/guides/custom-backend-guide.md)._GATING: dict[str, Capability]in_store.py(ID-159): Single source of truth for the method → capability mapping read byStore._gate(). Replaces the previous scattered gate logic; the new_BACKEND_GATING: dict[str, str]inscripts/gen_graph.pyplays the same role forBackend.__mirror__: ClassVar[type[...]]on async backends (ID-159):AsyncMemoryBackendandAsyncAzureBackendnow point at their sync peer via__mirror__, enabling static extraction ofmirrorsedges in the API graph.- RFC-0012 — Documentation Graph Model (ID-159, accepted): graph schema, snapshot
rules, and projection contract for the
graph.jsonartifact and downstream generators. - Documentation API graph generator (
scripts/gen_graph.py, ID-159): emitsdocs-src/_data/graph/graph.jsonwith capability/class/extra/method/requirement/package nodes and declares/gates/of/enables/mirrors/inherits edges. Method nodes carryis_abstract,is_async,file,line(schema 1.1, ID-164);mirrorsedges carrycapability_delta: {async_only, sync_only}so consumers can render sync↔async asymmetries — e.g.AsyncMemoryBackenddeclaresLAZY_READ;MemoryBackenddoes not, so the edge reportsasync_only: ["LAZY_READ"](schema 1.2, ID-162).source_versionandsnapshotare read frompyproject.toml, not hardcoded (ID-163).gen-graph/gen-graph-checkhatch scripts. FEATURES.mdprojection from API graph (scripts/gen_features.py, ID-163): regenerates the mechanical sections (backends_main,backends_flags,install_extras) fromgraph.jsonbetween<!-- BEGIN_GENERATED -->/<!-- END_GENERATED -->markers; rows are sorted alphabetically (ID-169) instead of by source-file declaration order.gen-features/gen-features-checkhatch scripts; release Phase 2 runsgen-graphthengen-featuresafterbump-my-version.- API-docs verifier (
scripts/check_api_docs.py, ID-170, ID-171): walksgraph.jsonanddocs-src/api/store.md/backend.mdin parallel through the same canonical mapping{method: frozenset(required_capabilities)}and flags missing:::directives or capability admonitions that drift from_GATING/_BACKEND_GATING. First catch: a!!! note "Requires Capability.GLOB"admonition placed before::: Store.globinstore.md— moved per the file's own placement-rule comment.gen-api-checkhatch script wired into the CI lint job. - Interactive graph visualisation (
scripts/gen_graph_viz.py, ID-165): self-contained D3 v7 force-directed HTML rendered fromgraph.jsonand committed atdocs-src/_data/graph/graph_viz.html. Nodes are colour-coded by kind; edges styled by type with directional arrowheads; abstract methods are dashed; async methods carry a small badge. Sidebar filter checkboxes, click-to-inspect detail panel, drag/zoom/pan.gen-graph-viz/gen-graph-viz-checkhatch scripts. scripts/check_test_placement.py(ID-168): AST-based lint check enforcing the test subpackage placement rule formalised insdd/TESTING.md§ Test Subpackage Placement. Wired into thelintCI job and thecheck-test-qualityhatch script.scripts/check_tla_no_emdash.py: CI guard rejecting non-ASCII em dashes in TLA+ and Dafny formal files; TLC's lexer treats U+2014 as a hard error.
Fixed¶
S3Backend(client_options={"client_kwargs": {"config": Config(...)}})raisedTypeError: got multiple values for keyword argument 'config'(BUG-185): s3fs'sset_session()always callsaiobotocore.create_client("s3", config=AioConfig(**self.config_kwargs), **client_kwargs), so anyclient_kwargs["config"]injected by the BUG-178 fix duplicatedconfig=. Reproduced on s3fs 2026.3.0 against an internal MinIO-style endpoint requirings3.addressing_style="path"andproxies={http: None, https: None}. Fixed by routing everybotocore.config.Configoption throughopts["config_kwargs"](a plain dict ofConfig(...)constructor kwargs);client_kwargs["config"]is never set, and a caller-supplied pre-builtConfiginclient_kwargsis rejected at backend construction withValueErrorpointing at the supported channel. Silent rewriting hid both this bug and BUG-178 and is no longer permitted. Spec S3-026 / S3PA-026 rewritten. Tests added at the actual collision boundary (TestAiobotocoreCreateClientBoundarypatchesaiobotocore.session.AioSession.create_clientand triggerss3fs.connect()), so a future variant of the same bug class fails the unit suite. New "Botocore Client Tuning" section indocs-src/guides/backends/s3.mddocuments proxies, retries, timeouts, and MinIO path-style addressing; runnable snippets inexamples/snippets/s3_botocore_tuning.pyare wired intotests/test_snippets.pyand the examples gate. Follow-up moto-backed e2e coverage tracked as BK-166. Migration: callers that passed a pre-builtbotocore.config.Configviaclient_options={"client_kwargs": {"config": Config(...)}}must switch toclient_options={"config_kwargs": {...}}(a plain dict of the sameConfig(...)constructor kwargs). The old form raisedTypeErrorat first I/O on s3fs ≥ 2024.x already; it now fails fast withValueErrorand a message naming the supported channel.
Changed¶
- Documentation filesystem reorganised along Diátaxis (ID-174): All prose moved
from
guides/and the repo root intodocs-src/<bucket>/(how-to,explanation,reference,further); the intermediatedocs/layer was collapsed and removed in the same release. Cross-bucket links acrossdocs-src/api/stubs, extension stubs, 26 example docstrings, andscripts/docs/render.pywere updated; absolute GitHub URLs are now used for links to repo files outsidedocs-src/(sdd/, CONTRIBUTING.md).mkdocs build --strictpasses with 0 warnings. Bookmarks to specific guide URLs may need updating.
Internal¶
- Async backends moved into
aio/backends/subpackage to mirror the syncbackends/layout. Public imports throughremote_store.aioandremote_store.aio.backendsare unchanged; only direct imports of private modules (e.g.remote_store._async_memory) are affected. - Test subpackage consolidation (ID-166, ID-167, ID-168):
tests/test_gen_graph.py→tests/scripts/test_gen_graph.py,tests/backends/test_dafny_classorder.py→tests/scripts/test_dafny_classorder.py,tests/test_gen_features.py→tests/scripts/test_gen_features.py;ROOTanchors corrected for the new depth. - Context7 indexing (ID-160):
context7.jsonschema fixes; library registered at/haalfi/remote-store(691 snippets, source-reputation High, benchmark score 91.3, version 0.24.0; verified 2026-04-29). - CodeQL
py/overly-complex-deletealert #55:AsyncAzureBackend.__del__refactored to delegate the open-clients check to a new private_has_open_clients()helper. - TLA+ toolchain pinning:
tla2tools.jarpinned to v1.7.4 inci.yml,tlc.Dockerfile, andscripts/tlc_check.sh(v1.8.0 was a pre-release with inconsistent checksums); em dashes removed from TLA+ and Dafny formal files.
[0.24.0] - 2026-04-26¶
Added¶
-
WriteResult: write methods return rich metadata;Store.head(); user metadata; hashing helpers; async parity (ID-146, ID-148, ID-013b): The entire write surface now returns a structured result and accepts optional user metadata. -
WriteResultdataclass — everywrite*()call returnsWriteResult(path, size, source, digest, etag, version_id, last_modified, metadata).sourcesignals origin:NativeSource(from the backend's write response),BasicSource(from a post-write stat), orSidecarSource(fromext.write). Two new capabilities gate the rich fields:WRITE_RESULT_NATIVE(backend populatesetag,digest,version_id,last_modifiedfrom its own response) andUSER_METADATA(caller-suppliedmetadata=is persisted). Store.head(path)— retrieves file metadata as aWriteResultwithout reading content; gated onCapability.METADATA.ext.write—write_with_hashandopen_atomic_with_hashguarantee a client-side SHA-256 digest inWriteResult.digestregardless of whether the backend declaresWRITE_RESULT_NATIVE; suitable for integrity-critical pipelines.- Async parity —
AsyncStore.write*()andAsyncBackend.write/write_atomicreturnWriteResultand acceptmetadata=;Capability.USER_METADATAenforced at theAsyncStorelayer;aio.ext.write.write_with_hashmirrors the sync helper. - Proxy forwarding —
ProxyStore,ObservedStore, andCachedStoreall forwardWriteResultandhead();StoreEvent.metadata["write_result"]is populated on successful writes. -
Docs: Write Integrity guide; RFC-0011 (Implemented).
-
AsyncBackendSyncAdapter(ID-141–143c): new public class wrapping anyAsyncBackendas a synchronousBackendvia a private event loop on a dedicated daemon thread. Design in ADR-0025; spec 029 § AsyncBackendSyncAdapter (ASYNC-080…093) covers streaming read/list pumps, write bridging,open_atomicsynthesis, capability translation, fail-fast guard for running-loop callers, bounded shutdown, and GC-path cleanup (_ChunkPullReaderasio.RawIOBase; best-effort__del__on_AsyncIteratorBridge). Full unit suite intests/aio/(every test traced to spec IDs), Azurite-backed integration suite for the full syncBackendcontract, and bridged-Azure variant in the e2e streaming chain. Decision guide:guides/async-sync-bridges.md. Unblocks ID-127 (Graph backend).
Fixed¶
-
AsyncMemoryBackend.deleteraisesInvalidPathfor directory paths (BUG-184): When a directory path was passed withmissing_ok=True, the backend silently returned instead of raisingInvalidPath— diverging from syncMemoryBackend(BE-012) and specASYNC-012. Fixed by inserting theisinstance(existing, _DirNode)guard mirrored from_memory.py:204-205; specASYNC-012tightened to pin the outcome; the PBT guard that previously suppressed the directory-path case is removed. -
s3fs lazy init raises
got multiple values for keyword argument 'config'(BUG-178): Whenclient_options={"config_kwargs": {...}}andretry=RetryPolicy(...)were both supplied,aiobotocore.create_client()received twoconfig=arguments. Fixed by extracting_S3Base._build_s3fs_kwargs(), which mergesconfig_kwargsinto a singlebotocore.config.Configbefore the retry-derived config is applied; bothS3BackendandS3PyArrowBackenddelegate to the shared builder. -
SQLBlobBackend.globdrops zero-segment**/matches on SQLite (BUG-175): SQLite'sGLOBoperator treated**as two independent*s and required a literal/between them, dropping zero-directory-depth matches. Replaced withextract_prefix+LIKEnarrowing; the existing Python regex handles final filtering. -
SQLBlobBackend.copy(src, src)no longer silently destroys data (BUG-176):copy()lacked thesrc == dstearly-return guard thatmove()has. Withoverwrite=Truethe single row was deleted before theINSERT ... SELECTran, destroying the file; withoverwrite=Falsethe method incorrectly raisedAlreadyExists. Fixed by mirroringmove()'s guard at the top ofcopy(): verify source exists, then return immediately. -
AsyncAzureBackend.writestreaming (BUG-165):writeandwrite_atomicmaterialized anyAsyncIterable[bytes]payload into a singlebytesbuffer before callingupload_blob/upload_data, holding the entire file in memory and breaking the streaming contract (SIO-003, ASYNC-021). The async iterator is now passed through — the Azure SDK acceptsAsyncIterable[bytes]directly and streams it in bounded memory. -
Docs pages deployment on release tags (BUG-164):
pagesjob moved to a dedicatedgh-pages-deploy.ymlworkflow triggered byworkflow_runonDocscompletion. Eliminates thegithub-pagesenvironment protection rule failure whendocs.ymlran in a tag ref context on release events.
Changed¶
- pyarrow 24.x mypy compatibility (BK-154): pyarrow 24.0.0 shipped partial
type stubs that surfaced
attr-defined,name-defined, andno-untyped-callerrors under mypy strict mode. Addedfollow_imports = "skip"forpyarrow/pyarrow.*inpyproject.toml, restoring pre-24 behaviour where allpa.*resolves asAny. Removed the now-redundant# type: ignore[import-untyped]annotations on pyarrow imports inext/arrow.py,ext/parquet.py,backends/_s3_pyarrow.py, andbackends/_sqlalchemy.py.
Documentation¶
-
Documentation gaps from Audit-011 resolved (BK-162): Fixed 16 findings across four areas — custom-backend guide and snippet updated for
WriteResultreturn type,metadata=kwarg, and new capability descriptions (USER_METADATA,WRITE_RESULT_NATIVE,LAZY_READ);async.mdgained a Write Results section,aio.ext.writeprose, and an Async-Sync Bridges cross-reference;extensions.mdadded theaio.ext.writetable row and import stubs;s3.md,azure.md, andlocal.mdreceived Write Results sections and correctedUSER_METADATAclaims. Audit report:sdd/audits/audit-011-docs-v023-gaps.md. -
SFTPGo compatibility note in README and SFTP guide: documents SFTPGo as a zero-dependency SFTP server for local development and CI, with a comparison table against OpenSSH-server.
-
Backend-specifics visibility in API reference (BK-153): added a three-tier admonition vocabulary (info/note/warning) across all
docs-src/api/pages — capability-gate notes on B-series methods, backend-conditional argument notes onmetadata=andmax_depth=, conditional field notes onFileInfo,WriteResult,FolderInfo, andBackendConfig.options, and interop-section warnings onBackend,AsyncBackend,AsyncStore,ProxyStore,ReadOnlyHttpBackend, andSFTPUtils. Vocabulary documented insdd/DOCUMENTATION.md. Closed all 20 findings from audit-009. -
Docs site spacing and typography (BK-157): custom CSS reduces whitespace across all pages — table cell padding halved, headings follow the classic generous-above/tight-below asymmetry, parameter/returns blocks and bullet lists are compact, adjacent heading-before-method gaps collapsed.
-
Content rule 6 — code examples are sourced, not written (ID-144): codifies the existing
examples/snippets/practice (ID-057, ID-106) insdd/CONTENT-RULES.mdso future doc PRs pull tested snippets rather than hand-writing fences. -
Formal-layer principles: Dafny vs TLA+ (ID-147): rewrites
sdd/formal/README.mdaround the repo's stance — Dafny for per-operation contracts, TLA+ for cross-layer protocol properties, decoupled rather than embedded. Adds the spec-decomposition authoring rule (write the TLA+ invariant first on multi-conjunction spec items) and four authoring rules (stand-alone modules, demonstrated bundling, CI-informational with cadence-based revisit, promotion gated on a real regression catch). -
OBS-003 step 6 outcome clarification (ID-147): spec 019 step 6/7 now state explicitly that
on_<op>andon_anyfire regardless of outcome, cross-referencing OBS-004. Matches existing code and tests; removes a drift surfaced by the OBS-003 hand-decomposition exercise (sdd/research/research-id-147-obs003-decomposition.md§ 2). -
Observer.tlaand informationalverify-tlaCI (ID-147): starts the live informal TLA+ layer undersdd/formal/tla/withObserver.tla— six invariants covering OBS-003 step 6/7 and OBS-003a dispatch routing, each paired with a break-and-catch row confirming orthogonality of the seeded mutations (the module is an authoring artefact persdd/formal/README.mdrule 3 — TLC on MC3 holds vacuously on the unmutated spec, so the CI job catches future edits to the model, not regressions inobserve.py; see module header Scope caveat for the OBS-009 gap). Thesdd/research/tla-poc/tree stays as the frozen 2026-04 PoC artefact..github/workflows/ci.ymlgains a non-blockingverify-tlajob (pinnedtla2tools.jar@v1.8.0, SHA-256 verified) that runs MC3 from the live layer. Status revisit tracked as ID-150 for 2026-10-19.
Internal¶
-
test_streaming_integritySFTP→Azure pipe-threshold adjusted (BUG-174):PIPE_THRESHOLDraised 1.5 MiB → 2.25 MiB. Root cause is a tracemalloc attribution artifact — Azure SDK's staged-block uploader holds two 1 MiB chunks live simultaneously when the source is wrapped inio.BufferedReader; revised ceiling reflects the two-chunk hold plus headroom. No production-code change. -
Async test coverage expansion (ID-155, ID-156, ID-157, BK-164): four new modules under
tests/aio/targeting async-specific concerns absent from the existing suite.test_async_pbt_stateful.py(ID-155) —RuleBasedStateMachinedrivingAsyncMemoryBackendandSyncBackendAdapter(MemoryBackend())in lock-step against a shared model; surfaced BUG-184.test_sync_adapter_conformance.pyextended with liveS3/moto,SFTP, andAzuritefixtures (ID-156) soAsyncBackendSyncAdapteris exercised against real SDKs and connection pools.test_async_azure_live.py(ID-157) — 17 Azurite-backed tests covering ETag/last_modifiedpropagation, chunked download,USER_METADATAround-trip, and HNSwrite_atomic.test_async_drift.py,test_async_adapter_unit.py,test_async_error.py(BK-164) — API-parity guard against the sync surface, executor-boundary unit tests, and error-passthrough assertions. -
Async e2e streaming integrity test (ID-138):
tests/e2e/test_async_streaming_integrity.pyvalidates a five-hop async chain (AsyncMemoryBackend→AsyncAzureBackend(Azurite) →AsyncMemoryBackend→SyncBackendAdapter(LocalBackend)→AsyncMemoryBackend) with per-hop SHA-256 verification. Falls back to a no-Azurite chain when the service is unavailable. -
HNS
write_atomicWriteResult parity tests (BUG-181): four mock tests added toTestAsyncAzureHNSPathscovering rich-field population fromget_file_properties();version_id/digestconfirmedNoneon HNS (ADLS Gen2PathPropertiesdoes not surface these);metadata=forwarding to pre-renameupload_data;overwritemode guards. -
pytest-asyncio event-loop leak and
ResourceWarningfixes (ID-158, BUG-180): session-scoped_close_leaked_event_loopsfixture closes orphaned loops before GC at teardown, eliminating theResourceWarningpromoted to error by BK-158. Companion fix:UrllibTransport._request()wraps the caughtHTTPErrorincontextlib.closing()so the file descriptor is released before GC on Python 3.14. -
PBT
BackendModeltracks empty directory nodes (BUG-183):BackendModelintest_pbt_stateful.pyderived implicit dirs from the live-file map, forgetting nodes after the last file was deleted and diverging fromMemoryBackend's MEM-DS-006 dir-retention semantics. Fixed with a separateself.dirsset;delete_folderrule added. No production-code change. -
Test fixture and duplication consolidation (ID-153, BK-156, BK-163):
_free_port,moto_server,azurite_server, and availability helpers promoted to roottests/conftest.py, eliminating cross-boundary imports. ~110 duplicate conformance tests removed fromtest_sftp.py,test_azure.py, andtest_sqlblob.py. Five shared S3/S3-PyArrow invariants extracted totest_s3_shared.py. -
Extension import-path contract (BK-160, BK-161): Rule 12 added to
sdd/DESIGN.mdrequiring extensions to use public import paths when one exists.test_no_private_module_importsAST checker enforces it; 10 private-path imports fixed across 10 modules; 3 justified exceptions documented with inline comments. -
filterwarnings = errorin pytest (BK-158): unhandled warnings now fail the suite; existing SQLAlchemy suppressors retained with inline justification. -
S3 and S3-PyArrow test and spec consolidation (BK-155): shared invariants extracted to
tests/backends/test_s3_shared.py, parametrized over both backends with per-parampytest.mark.spec(...)marks preservingS3-NNNandS3PA-NNNtraceability; Category-1 conformance duplicates removed fromtest_s3.pyandtest_s3_pyarrow.py. -
Dafny
WriteResult—last_modifiedspec-opacity closed (ID-152):MemoryBackend.dfy:Writereturns a capability-conditional timestamp witness (Some(0)whenCapWriteResultNative in capabilities) instead of the hardcodedOption_None(). Adapter attests/backends/dafny_oracle.pyliftsSome(n)todatetime.fromtimestamp(n, tz=timezone.utc). -
Hypothesis property coverage for
WriteResult(ID-151c): addstests/test_pbt_write_result.pywith two property tests on top ofTestWriteResultConformance. The first exercisesWriteResult.size == len(payload)onMemoryBackend(small regime) andLocalBackend(256 KiB – 1 MiB BUG-168 buffer boundary — the only v1 backend whose write path runs through a realBufferedWriter) acrossbytes/BinaryIOinputs andoverwrite=True/overwrite=False. The second exercises WR-012 echo + WR-013get_file_inforound-trip on the two backends that go through a real SDK serialisation layer (S3 viamotoserver mode; Azure via Azurite when reachable). Strategies are module-scope and WR-011-compliant; profiles inherit fromtests/conftest.py(dev 50 / ci 100 / nightly 1000). -
Retired per-backend
WriteResulttest duplication (ID-151b): removedTestLocalWriteResult,TestSFTPWriteResult,TestS3PyArrowWriteResult, and the genericwrite/write_atomic/ size / metadata-echo methods fromTestS3WriteResult,TestAzureWriteResult, andTestAzureWriteResultIntegration. All removed cases are now covered byTestWriteResultConformance. Azure- and S3-specific SDK-level assertions (etag stripping, version_id, digest, metadata-passed-to-SDK, HNS atomic path, capability declarations, Azurite etag / last_modified wire checks) are retained. -
Dependabot auto-merge workflow hardening: restores the
github.repository == 'haalfi/remote-store'guard, removes the unuseddependabot/fetch-metadatastep, switches the merge command to--squash --delete-branch, adds a per-PRconcurrencygroup, and guards onpull_request.state == 'open'. Header comment documents the intentionalpull_request_reviewtrigger and the recovery path for PRs that miss the workflow window. -
MemoryBackendMinimalsatisfiability witness (ID-151, part 4): adds a sibling refinement class tosdd/formal/MemoryBackend.dfythat declares neitherCapWriteResultNativenorCapUserMetadata. This makes the WR-010CapabilityNotSupportedgate live code (not dead code as inMemoryBackend) and forceswr_sourcetoBasicSourceon every successful write — closing the refinement coverage gap flagged in the part-1 review.bash scripts/dafny_verify.sh MemoryBackend.dfypasses with 98 verified, 0 errors. -
DafnyOracleBackendadapter widening (ID-151, part 5):write()andwrite_atomic()intests/backends/dafny_oracle.pynow accept ametadata=kwarg and returnWriteResult, matching the Part-1 ABC.get_file_info()andlist_files()now marshal the DafnyFileInfo.metadatafield. AllTestWriteResultConformanceskips ondafny-oracleare removed;test_native_populates_last_modifiedis xfailed (BUG-169 parity — the DafnyMemoryBackend.WritehardcodesOption_None()forlast_modified). -
Python WR-* conformance assertions (ID-151, part 3): adds
TestWriteResultConformanceintests/backends/test_conformance.py, exercising every backend'swrite/write_atomicreturn value against the DafnyWritepostconditions insdd/formal/BackendContract.dfy(spec 045 WR-001a, WR-004, WR-005, WR-012, WR-013). Rich-field checks are gated onCapability.WRITE_RESULT_NATIVE; metadata checks are gated onCapability.USER_METADATA. The new fixture-level assertions surface two pre-existing backend defects as strictxfails — BUG-169 (MemoryBackenddropslast_modified) and BUG-170 (SQLBlobBackenddropslast_modified) — so fixing each bug flips the xfail and forces thexfailmarker off in the same commit. -
Dependabot auto-merge workflow: adds
.github/workflows/dependabot-auto-merge.yml. On approval of a Dependabot PR by the repo owner, enables GitHub auto-merge (squash); GitHub merges automatically once thegatecheck passes. -
Dafny
WriteResultoracle regeneration + adapter update (ID-151, part 2):scripts/dafny_translate.shDocker wrapper (analogous toscripts/dafny_verify.sh) translates Dafny specs to Python usingdafny build -t pywithout a native Dafny install. Companionscripts/_dafny_classorder.pyautomates the previously manual class reorder (ADT types →Backend→default__→MemoryBackend) somodule_.pyimports cleanly. RegeneratesMemoryBackend-py/under the Part 1 contract (Dafny 4.11.0, matching the verify pin), andDafnyOracleBackendnow passesOption_None()for the new fourthmetadataparameter onWrite. Oracle-gated conformance run green (154 passed, 5 skipped). Follow-up remainder inBACKLOG.md:MemoryBackendMinimalsatisfiability witness for theCapabilityNotSupported/BasicSourcebranches. -
Dafny
WriteResultextension (ID-151, part 1): widens the Backend traitWriteto returnResult<WriteResult>with a fourthmetadataparameter, addsFileInfo/WriteResult/ContentDigest/WriteSourcedata models,CapWriteResultNativeandCapUserMetadatacapabilities, and theWriteResultFromFileInfofunction plus theWR008FieldMappinglemma. Encodes WR-001a, WR-004, WR-005, WR-010 (with the empty-mapping carve-out viaHasUserMetadata), WR-012, and WR-013 as Write postconditions; the WR-006 negative direction (Write never producesSidecarSource) is enforced structurally by theWritepostcondition restrictingsourcetoNativeSource | BasicSource.MemoryBackendrefines the widened contract. -
scripts/gen_pages.pyrefactor: split the 840-line mkdocs-gen-files hook intoscripts/docs/{scan,render,nav,link}.pyplus a 70-line orchestrator; example metadata and link rewrites are now data-driven viaSddKind, self-describing example docstrings, andLinkResolver. -
Microsoft Graph backend — SDD artifacts (ID-127): accepted
sdd/rfcs/rfc-0010-graph-backend.md, ADRssdd/adrs/0021-graph-sdk-choice.md(SDK choice),sdd/adrs/0022-graph-auth-model.md(auth model),sdd/adrs/0023-async-monitor-polling.md(async monitor polling), andsdd/adrs/0024-resource-locked-error.md(ResourceLockederror), plussdd/specs/044-graph-backend.md(GR-001..GR-057). Amendssdd/specs/005-error-model.mdwith ERR-013 (ResourceLocked) andsdd/specs/025-retry-policy.mdwith RET-015 (Graph retry mapping). No runtime changes. -
Test-quality cleanup on coverage PR (BK-151): cross-platform-safe tests, real assertions on previously mock-only checks,
spec=on everyMagicMock().
[0.23.0] - 2026-04-12¶
Added¶
Capability.LAZY_READ(BK-146): New quality flag that indicatesread()fetches data lazily on demand from the native source. Backends that pre-load the full file into memory before returning a stream (Memory, SQLBlob, SQLQuery) do not declare it. Declared by Local, HTTP, S3, S3-PyArrow, SFTP, and Azure. Spec SIO-009 added; conformance tests added; capabilities matrix, backend guides, andFEATURES.mdupdated.
Fixed¶
-
Azure chunked upload (BUG-161):
AzureBackend.write()now uses staged-block upload instead of buffering the entire stream into memory. Setsmax_single_put_sizeandmax_block_sizedefaults (256 KiB) on bothBlobServiceClientandDataLakeServiceClient. Users can override viaclient_optionsfor throughput tuning on large files. -
Transfer pipe memory overhead (BUG-162): Backends that relied on the platform-default
shutil.copyfileobjbuffer (1 MiB on Windows) now use an explicit 256 KiB copy buffer. This keeps peak pipe-layer memory (two live chunks: current read + previous write) well under the streaming threshold. SFTP was unaffected (already used 32 KiB chunks). -
known_hostsmode test skipped on Windows (BUG-163): NTFS ignores POSIX mode bits; mode-assertion test now skipped on Windows. File-creation check retained on all platforms. -
cast()string-literal removal (BK-146): Allcast("TypeName", value)patterns replaced withcast(TypeName, value) # noqa: TC006. Removed redundantBufferedReader-over-BytesIOwrappers in Memory and SQL backends (BytesIOisBufferedIOBaseand needs no extra buffering). RemovedBufferedReaderfrom S3 backend — s3fsAbstractBufferedFilealready provides internal buffering.FileInfo/FolderInfopromoted to runtime imports incache.py. -
Pages deployment failure (BUG-144): Batch mike pushes into one and replace built-in deployment with explicit
deploy-pages@v4(Node 20) job, fixing "Multiple artifacts" error andDEP0040punycode warning.
Documentation¶
-
Documentation content longevity rules (BK-148): New
sdd/CONTENT-RULES.mdkeeps prose from drifting out of sync with code and generated artefacts.DOCUMENTATION.mdandCLAUDE.mdupdated to reference it. -
Documentation: remove stale enumerations, unify API references (BK-149): Removed hardcoded counts from guides and docstrings. Reordered Capability enum logically; updated all references consistently. Standardized ext module docstrings to MkDocs admonition syntax.
-
Documentation: apply content-longevity rules to Guides, Explanation, and Examples (BK-149b): Pseudo-precise values, exhaustive enumerations, and stale counts removed from Guides, Explanation, Getting Started, README, and example docstrings. Completes BK-149.
-
SQL blob non-lazy write (ID-136): Backend guide, capabilities matrix, and docstring updated.
-
Design index and Further Reading reshaped (BK-150):
design/now surfaces everysdd/process document — Design, Testing, Documentation Standards, Content Rules, Process — plus a new Audits section generated fromsdd/audits/.further-reading.mdstops duplicating the SDD trail and points atdesign/instead; it keeps only documentation-convention and community/policy links (Contributing, Dev Story, Changelog, Security, Code of Conduct, Citation). Contributing and Development Story dropped from the top nav. -
Azure backend guide (ID-137): Updated
max_block_sizeandmax_single_put_sizelibrary defaults from 256 KiB to 1 MiB.
Internal¶
-
Streaming overhead reduction (ID-137): Reduced per-hop allocations across Memory, SFTP, Azure, and S3-PyArrow backends. Azure block size decoupled from the pipe-layer copy buffer; streaming integrity thresholds recalibrated from e2e measurements.
-
Streaming integrity test hardened (BUG-161, BUG-162): Memory and chunk violations are now hard failures instead of warnings. Non-lazy destinations (SQL BLOB) are exempt from chunk-count checks (by-design, ID-136).
-
Dafny upgraded to 4.11.0 (ID-134c): Removed
SumSizesAddOneLocalworkaround — Boogie cross-file lemma bug fixed in 4.11.0. Toolchain updated throughout (CI,session-init.sh,dafny_verify.sh, README); ubuntu-22.04 zip used (20.04 not published for 4.11.0). -
Dafny ghost infrastructure for
GetFolderInfoaggregate verification (ID-134, part 1): AddedChildFiles,SumSizes, andSumSizesAddOnetoBackendContract.dfy. Pure additions — no existing postconditions changed. -
Verified
GetFolderInfoaggregate postcondition (ID-134, part 2): StrengthenedGetFolderInfopostcondition to assertfile_count == |ChildFiles(fs, path)|andtotal_size == SumSizes(fs, ChildFiles(fs, path)).MemoryBackend.dfyrefinement proves the loop computes these correctly via ghost set tracking andSumSizesAddOneinduction. -
SDD Expert in orchestrate skill (BK-147): Added a 5th domain expert focused on spec-code consistency, ADR coverage, and process guide accuracy. Scoped to
sdd/(specs, ADRs, RFCs, formal, process guides). -
Mutation testing CI workflow (BK-145): Added
.github/workflows/mutation.ymlwith manual (workflow_dispatch) and weekly scheduled (cron, Saturdays 05:00 UTC) triggers. Runs all 6 scoped mutation targets in parallel via matrix strategy, uploads HTML reports as artifacts, and writes job summaries. Cloud-backend scope starts MinIO/Azurite/SFTP services. Gremlins cache persisted across runs viaactions/cache. -
E2e streaming integrity test (ID-135): Proves the streaming contract -- round-robin SHA-256 verification and
tracemallocmemory profiling across all backends. -
Codecov upload moved to publish workflow: Coverage is now uploaded to Codecov on
release: published(inpublish.yml) rather than on every CI run. Ensures Codecov reflects released versions only, matching PyPI. Thecoveragejob runs after a successfulpublish(needs: publish), starts Azurite, runs the full suite at the 95% threshold, and uploadscoverage.xml.
[0.22.1] - 2026-04-10¶
Fixed¶
- CodeQL alerts (BK-143): Resolved all 31 open CodeQL security/quality
alerts — SFTP
known_hostspermissions (0o644→0o600), resource cleanup, unused imports, and type-stub no-ops. Follow-up: keptBinaryIOas a runtime import (suppressingTCH003/TC006) socast(BinaryIO, ...)call sites remain genuinely used at runtime and are not flagged by CodeQL.
[0.22.0] - 2026-04-09¶
Added¶
- Dafny-compiled oracle as conformance gate (BK-139c, ID-133): The
mathematically verified
MemoryBackend.dfy(53 proofs, 0 errors) is now compiled to Python and runs through the full conformance suite asDafnyOracleBackend. Validates the conformance suite: if the oracle passes a test, the test is known-correct. Handwritten POC oracle removed. Compiled with Dafny 4.9.1 (downgraded from POC's 4.11.0 — 4.9.1 is the version available in the CI environment). Capability.ATOMIC_MOVE(ID-128): New capability flag indicatingmove()is guaranteed atomic under concurrent access. Declared by Local, Memory, and SQLBlob backends. S3, S3-PyArrow, Azure, and SFTP omit it (copy-then-delete semantics). Query withstore.supports(Capability.ATOMIC_MOVE).- Extended conformance suite (BK-139b): 42 test functions (~53 parameterized
cases per backend) derived from Dafny
BackendContract.dfypostconditions. Covers error fidelity, precondition ordering, listing completeness, depth filtering, move/copy semantics, resource cleanup, and operational consistency. Marked@pytest.mark.extended_conformancefor CI isolation. ResourceWarningsafety net (BK-139b):__del__methods onSFTPBackend,AzureBackend, andAsyncAzureBackendemitResourceWarningif.close()/.aclose()was not called. Sync backends also clean up connections; async backend warns only (cannotawaitin__del__).- Ruff
BLErule set (BK-139b): EnabledBLE001(blind exception) linter rule. All 44 existing intentional broad catches annotated with# noqa: BLE001(37 insrc/, 7 intests/). - Property-based tests (BK-139a): Hypothesis PBT suite covering
partition roundtrip (P1), config
from_dictcorruption (P2), path normalization idempotence (P3), and stateful MemoryBackend model (P4). Three profiles:dev(50 examples),ci(100),nightly(1000). - PBT testing rules in
sdd/TESTING.md: Rules 9–11 covering the@givenassertion requirement, profile discipline, and strategy scope. _safe_wrap()helper in_stream.py: Safely wraps a stream through one or more wrapper layers, closing all acquired resources if any wrapper constructor raises.- Dafny formal verification layer (
sdd/formal/): Machine-checkable backend contract specification covering the six BK-140 gaps — precondition ordering (BE-008), error mapping (BE-021), listing semantics (BE-014/015), depth counting (DEPTH-001), move atomicity (BE-018), and resource safety (SIO-001). IncludesMemoryBackendreference refinement, verified depth algorithm, and_safe_wrapleak-freedom proof. - Dafny
GetFolderInfomethod (ID-130): AddedGetFolderInfotoBackendContract.dfywith postconditionsIsFile → InvalidPath,!PathExists → NotFound,IsDir → Ok. Verified inMemoryBackend.dfy. Closes the BE-017 formal coverage gap. - CI: Dafny verification gate in
ci.yml: Runsdafny verifyon all formal specs whensdd/formal/orsdd/specs/files change. Skipped for code-only PRs. - DafnyOracle POC (BK-139c): Proof-of-concept reference oracle implementation
derived from
MemoryBackend.dfyformal specification. Two approaches validated: handwritten oracle (680 lines, 32 passing tests) and compiled oracle (Dafny 4.11.0, 41 verified proofs). Demonstrates feasibility of spec-based conformance testing. Seesdd/formal/POC/for implementation and roadmap. Not production-ready. - BK-140a backend contract spec amendments (
sdd/specs/): Six spec amendments tightening the backend ABC behavioral contract — precondition evaluation order with flat-namespace exemption (BE-008), canonical error mapping table aligned with Dafny postconditions (BE-021), missing-path listing semantics (BE-014/015), reference depth algorithm (DEPTH-001), move/copy atomicity documentation (BE-018/019), and acquire-then-wrap resource safety invariant (SIO-001). Per-method Raises clauses updated for BE-006 through BE-019 to be consistent with the canonical table. - Query method behavior under file-as-directory-component (ID-129):
Codified behavior for
exists(),is_file(),is_folder()when paths contain file-as-directory-component ancestors (e.g., queryinga/b/cwhena/bis a file). All three query methods returnFalserather than raisingInvalidPath. Spec amendments to BE-004, BE-005, BE-021 document this "accidental consensus" behavior. Dafny formal methodsIsFileMethod()andIsFolderMethod()withAllAncestorsTraversablepredicate verify the contract inMemoryBackend.dfy. Extended conformance tests cover all backends (Local, S3, Azure, SFTP, etc.).
Fixed¶
- PBT stateful model:
read_byteson implicit directory (BUG-160): TheBackendModel.read_bytesrule did not skip paths that are implicit directories (created as a side-effect of writing a nested file). Callingread_bytes('d')afterwrite_new('d/0')raisedInvalidPathinstead ofNotFound, causing thepytest.raises(NotFound)guard to fail. Added an early-return guard for directory paths. - ADR-0008 conformance:
ext.arrowTier 1 probe (BK-141): UpdatedStoreFileSystemHandler.__init__to narrow exception catch fromExceptionto(CapabilityNotSupported, TypeError, OSError)and documented the capability-probe pattern in ADR-0008 as an explicit exception to the "CapabilityNotSupported must propagate" rule. OSError catch handles cloud backend initialization failures (e.g., S3 endpoint unreachable). Added testtest_tier1_unexpected_exception_propagatesto verify unexpected exceptions are not silently caught. The pattern is now ADR-endorsed for optional feature detection during extension initialization. - Type-mismatch errors now raise
InvalidPathper spec (ID-131):read(),read_bytes(),delete(),get_file_info(),get_folder_info(),delete_folder()on the wrong path type (directory vs file) now raiseInvalidPathinstead ofNotFoundin LocalBackend, MemoryBackend, and SFTPBackend — matching the DafnyBackendContract.dfypostconditions and BE-021 canonical error mapping.move()/copy()now check source and destination types across all three backends. Self-move and self-copy (src == dst) are no-ops in Local, Memory, S3, S3-PyArrow, and SFTP backends (previously leakedSameFileErroror lost data). - S3
read()leaks file handle if stream wrapping fails (BUG-159): BothS3Backend.read()andS3PyArrowBackend.read()now use the new_safe_wrap()helper to close raw handles if wrapping constructors raise.
Documentation¶
- Custom backend guide: conformance suite integration (ID-132): The
"Testing your backend" section now explains how to register a new backend in
tests/backends/conftest.pyto run the full conformance suite automatically, how_require()/ capability gating causes tests to self-skip (not fail) for partial-capability backends, and the flat-namespace vs. hierarchical backend distinction (_FLAT_NAMESPACE_BACKENDS). Added a conformance checklist table (basic, extended, error mapping, repr safety). Corrected spec coverage range (BE-001–BE-025 + ancillary specs) and test count (50 Dafny-derived tests). - Added "Quality & Testing" section to README explaining testing dimensions (spec-driven development, unit tests, PBT, formal verification, mutation testing, benchmarks, examples).
Internal¶
- CodeQL hardening (BK-142): Scoped CodeQL analysis to
src/remote_store/via.github/codeql/codeql-config.yml; upgraded query suite from default tosecurity-and-quality; addedon.pathstrigger filter on push to skip doc-only merges; removedpathsfilter frompull_requesttrigger so the "Analyze (Python)" status check is always posted (prevents branch-protection merge blocks on non-code PRs); addeddependency-reviewjob to catch CVEs in dependency changes on PRs; annotated intentionalpickle.loads(dagster ext) andruamel.yamlsafe-mode loader (yaml ext) with CodeQL justification comments.
[0.21.1] - 2026-04-03¶
Fixed¶
- Azure
list_filesignoresmax_depth(BUG-155): BothAzureBackend.list_filesandAsyncAzureBackend.list_filesnow implement depth limiting whenrecursive=Trueandmax_depthis specified, consistent with S3 and Local backends. - Sync
AzureBackend.close()doesn't closeDefaultAzureCredential(BUG-156): Sync backend now caches the credential and closes it inclose(), matching the async backend'saclose()pattern. - Sync
AzureBackend.delete_foldernon-HNS materializes all blobs (BUG-157): Existence check now stops after the first blob instead of eagerly fetching all blobs into memory. - Sync
AzureBackend.read()doesn't protect downloader on stream-wrapping failure (BUG-158): The downloader is now cleaned up if_ErrorMappingStreamorBufferedReaderconstruction fails. LocalBackendleaksIsADirectoryErrorfor directory paths (BUG-153, BUG-154):read(),read_bytes(),delete(),write(),write_atomic(), andopen_atomic()now catchIsADirectoryErrorand raiseNotFound(read/delete) orInvalidPath(write), consistent with MemoryBackend.delete(missing_ok=True)on a directory is now silenced, matching MemoryBackend's behavior.- S3
client_optionsshallow copy mutates caller's nested dicts (BUG-148): Lazy filesystem init now deep-copiesclient_optionsso that addingregion_name,config, orverifytoclient_kwargsdoes not modify the caller's original dict. Affects bothS3BackendandS3PyArrowBackend. - S3PyArrow
get_file_inforeturns no ETag and no digest (BUG-150):get_file_infonow usescall_s3("head_object", ChecksumMode="ENABLED")likeS3Backend, returning both ETag and digest when available. - S3PyArrow
_extract_etagscope too broad (BUG-151):_extract_etagoverride now only affects listing paths;get_file_infoextracts ETag from the HeadObject response. - S3
list_filesignoresmax_depth(BUG-152):_S3Base.list_filesnow implements native depth limiting during BFS traversal, consistent with all other backends. - SFTP
delete_foldermaskslistdirpermission errors (BUG-147): Non-recursivedelete_foldernow re-raises non-ENOENT errors fromlistdirinstead of silently treating them as empty. - SFTP listing methods silently swallow non-ENOENT errors (BUG-146):
list_files,list_folders, anditer_childrennow only suppress ENOENT fromlistdir_attr; other errors propagate asRemoteStoreError. - SFTP
_ensure_parent_dirsswallows permission errors (BUG-145): Parent directory creation now only catches ENOENT onstatand EEXIST onmkdir; other errors propagate. - SFTP SSH client leaked on connection failure (BUG-144):
_connect()now closes theSSHClientif the retry-wrapped connect exhausts attempts. - SFTP
st_modeNone causes TypeError (BUG-143): Entries withst_mode is Noneare now skipped in listing, traversal, and stats methods. - SFTP
read()leaks file handle if stream wrapping fails (BUG-142): The paramiko file handle is now closed if_ErrorMappingStreamorBufferedReaderconstruction raises. config_loaders.pyexample crashes on Windows (BUG-136): Path interpolation into TOML/YAML strings produced backslashes which are invalid escape sequences. Now uses forward slashes on all platforms.- CachedStore write doesn't invalidate parent directory metadata (BUG-137):
Writing a nested path (e.g.
dir/file.txt) now also invalidates cachedexists()/is_folder()/is_file()entries for all ancestor directories, not just the leaf path. - CachedStore.child() creates isolated cache (BUG-138):
_wrap_child()now passes the parent's cache backend to the child and tracks the child's path prefix so mutations through a child store correctly invalidate the parent's cached entries. - RegistryConfig.from_dict crashes on null options (BUG-139): YAML/TOML
options:with no value (None) now treated as empty dict instead of raisingTypeError. - RegistryConfig.from_dict converts null to string "None" (BUG-140):
Null
typeorbackendnow raisesTypeErrorwith a clear message instead of silently producing the string"None". Nullroot_pathis treated as empty string (same as omitted). - partition_path allows
=in key (BUG-141): Partition keys containing=now raiseValueError, preventing round-trip failures withparse_partition.
Changed¶
- Examples reorganized into topical subdirectories — examples are now
grouped into
getting_started/,configuration/,errors/,advanced/,backends/,extensions/, andintegrations/for easier navigation. The docs index page reflects the new 7-section layout. All import paths, CI workflows, and docs references updated accordingly.
Internal¶
- Remove Pygments
<2.21upper-bound pin: pymdown-extensions 10.21.2 fixed thefilename=Nonehighlight bug. Pygments is now unpinned above 2.18. - Deduplicate
pyproject.tomldependency lists (BK-138): Hatch env usesfeatures = ["dev", "docs", "bench"]instead of re-listing 43 packages.bench,docs, anddevextras compose from user-facing backend/extension extras via self-referential dependencies. Removed cargo-culteds3fsfromdocsextra (all backends use lazy imports).
[0.21.0] - 2026-04-01¶
Added¶
resolve_env()env-var interpolation (ID-126):resolve_env(data)resolves${VAR}and${VAR:-default}placeholders in config dicts.from_toml()andfrom_yaml()gain an opt-inresolve_env_vars=Trueparameter. Standalone function exported fromremote_storefor custom loaders. Spec: CFG-018..CFG-021.- Async Store API (
remote_store.aio) (ID-013):AsyncStore-- async counterpart toStorewith coroutine methods for all operations.AsyncBackendabstract base class for native async backends.SyncBackendAdapterwraps any synchronous backend for async use (delegates to a thread-pool executor).AsyncMemoryBackendfor async testing. Phase 1 — core primitives. AsyncAzureBackendnative async backend (ID-013 Phase 2): First native async backend forremote_store.aio. Uses Azure SDK async clients (azure.storage.blob.aio,azure.storage.filedatalake.aio) for true non-blocking I/O. Shared helpers extracted to_azure_common.pyfor sync/async code reuse. Zero new dependencies.FEATURES.mdat repo root — versioned snapshot of backends, extensions, capabilities, and install extras for agent and human discoverability (BK-136).remote_store.info()public function — runtime introspection of available backends and extensions in the current environment (BK-136).CLAUDE.mdnow referencesFEATURES.mdfor cold-start agent sessions (BK-136).- Release checklist in
CONTRIBUTING.mdnow includesFEATURES.mdupdate (BK-136). - Dagster multi-partition loading —
load_inputnow returnsdict[str, Any]when the input context carries multiple partition keys (time-window aggregation). Applies to both the bytes-serializer IO manager and the dataset IO manager (ID-124, spec DAG-020).
Changed¶
ParquetSerializer.deserialize()now returns a PyArrow Table instead of a pandas DataFrame (BUG-135). Removes the hidden hard dependency on pandas for users installingremote-store[dagster,arrow]without pandas. Callers that need pandas calltable.to_pandas()on the result. See Migration Guide.
Documentation¶
- Spec 029 amendments (ID-013b): add round 2 §2.4 items (ASYNC-036/037,
ASYNC-052a–e, ASYNC-057/058, ASYNC-061/062) and Phase 2
AsyncAzureBackendspec (ASYNC-070–079). Updatemax_depthon ASYNC-014/015/017,resolve()in ASYNC-034 passthrough list, and ASYNC-046 enumeration. - Expand async guide with native backend section (
AsyncAzureBackend), health check (ping()), and updated limitations. - Fix CHANGELOG migration-guide link for GitHub (move
guides/migration.mdto repo root somigration.md#…resolves in both GitHub and docs). - Fix stale pandas reference in Dagster guide — Parquet serializer deserializes to a PyArrow Table, not a pandas DataFrame.
Internal¶
- Test quality: TESTING.md compliance (BK-137): Fixed Rule 2 (sole
isinstance→ behavioral assertions) and Rule 7 (copy-paste → parametrize) violations in post-v0.20.0 async and dagster tests. Improved coverage for_azure_common(69→100%),_async_azure(89→95%),_sync_adapter(93→98%),_async_store(96→98%). - Fix 72
ResourceWarning: unclosed databasein SQL backend tests by adding proper fixture teardown andclose()calls. Filter residual SQLAlchemy pool warning on Python 3.13+ (BK-135). - Replace
isinstance-only assertions (12 tests) and private attribute assertions (~15 instances) with behavioral checks (BK-134). - Upgrade
setup-uvfrom v7 to v8.0.0 (immutable tags) across all workflows. - Disable uv caching on lightweight CI jobs to eliminate cache-contention warnings.
[0.20.0] - 2026-03-30¶
Added¶
-
Dagster extension v2 (
ext.dagster) (ID-083):DagsterStoreResource(ConfigurableResource) for direct Store access in assets, andRemoteStoreIOManager(ConfigurableIOManagerFactory) for config-driven IO management with automatic Store lifecycle. Dataset mode viadagster_dataset_io_manager()orserializer="parquet-dataset"writes Parquet datasets throughParquetDatasetStore. Spec 031 (DAG-012 -- DAG-019). -
Parquet Dataset Storage extension (
ext.parquet) (ID-122):ParquetDatasetStore— high-level Parquet dataset read/write with manifest metadata,_SUCCESScompletion markers, and atomic-commit semantics. Supports single-file and multi-part layouts, column projection on read, and overwrite semantics. Extension-specific errors:DatasetIncomplete,ManifestCorrupted(import fromremote_store.ext.parquet). Spec 042. -
resolve()introspection API (ID-120):Store.resolve(key)returns a frozenResolutionPlandataclass describing how a key maps to its storage location, backend identity, and backend-specific context. Available on all backends with no I/O. Enables debugging ("which backend handled this key?"), principled cache key derivation, and future composite store composition. Spec 043. -
max_listing_sizeparameter forcache()(BK-123 M-1): Skips caching listing results (list_files,list_folders,iter_children,glob) that exceed the given item count. Complements the existingmax_content_sizeguard forread_bytes. -
SQLQueryBackend— read-only SQL query materializer (ID-119 v2): Maps path keys to SQL queries and serializes results to Parquet, CSV, or Arrow IPC based on the key's file extension. Explicit query mappings viaqueriesdict;strict=Truedefault (view/convention discovery deferred).ResultSerializerprotocol with built-inArrowSerializer. New optional extra:pip install remote-store[sql-query]. Spec 041. -
SQLBlobBackend — SQL key-value blob storage (ID-119 v1): New
SQLBlobBackendbacked by SQLAlchemy. Uses a SQL table as key-value store with full Backend contract (all 10 capabilities). SQLite optimizations: WAL mode,PRAGMA synchronous=NORMAL. Supports owned or borrowed engines, custom table names, existing table introspection (create_table=False), andmax_blob_sizeguard. Optional extra:pip install remote-store[sql]. Spec 040. -
TLS CA bundle support for S3 backends (ID-118): New
tls_ca_bundleparameter onS3BackendandS3PyArrowBackendreplaces nestedclient_options={"client_kwargs": {"verify": path}}. Falls back toAWS_CA_BUNDLE/REQUESTS_CA_BUNDLE/SSL_CERT_FILEenv vars. Early path validation at construction time. Spec 039. -
S3 endpoint URL normalization (ID-117):
S3BackendandS3PyArrowBackendnow accept barehost:portvalues forendpoint_urland auto-prefix them withhttps://. Reduces migration friction from PyArrow'sendpoint_overridewhich accepted bare endpoints. URLs with existing schemes are unchanged. Spec S3-025 / S3PA-023. -
Non-recursive
get_folder_info(ID-112):Store.get_folder_info(path, max_depth=N)controls traversal depth for folder statistics.max_depth=0aggregates only direct children;max_depth=Nincludes files up to N levels deep.None(default) preserves the existing full-recursive backend delegation. Store-level computation usinglist_files(); no Backend ABC change.CachedStoreandObservedStoreforward the parameter. Spec 038. -
Depth-limited listing (ID-107, ID-108):
Store.list_files(max_depth=N)andStore.list_folders(max_depth=N)control traversal depth without fetching the full recursive tree. Whenmax_depthis set onlist_files,recursiveis ignored. Client-side filtering at the Store level; no Backend ABC change. Spec 037. -
Backend-native
max_depthoptimization (ID-107b):Backend.list_files()now accepts optionalmax_depthkwarg. Local, SFTP, and Memory backends prune traversal natively, reducing filesystem and network I/O. S3/Azure accept the parameter but defer to Store-level client-side filtering. Spec 037 (DEPTH-003). -
Azure range reader (ID-102):
AzureBackend.read_seekable()returns a seekable stream backed bydownload_blob(offset=, length=). Eachread()issues a single HTTP Range request — no full-file download. Enables PyArrow Tier 3 column pruning for Parquet on Azure. -
S3-PyArrow in comparative benchmarks (ID-104): S3-PyArrow now appears in overhead charts, comparative reports, and user-facing verdicts with boto3 as its raw SDK baseline. New S3 vs S3-PyArrow comparison chart.
-
Overhead-vs-RTT chart (ID-104): Replaces the placeholder with a real line chart showing how overhead % changes across network latency profiles (clean, rtt20, rtt50, rtt100). Raw SDK targets added for latency backends for apples-to-apples comparison. Network profile metadata saved in benchmark JSON.
-
--fileflag for benchmark tools (ID-104):report.pyandcharts.pyaccept--file PATHto load a specific JSON file instead of auto-detecting the latest. -
Latency matrix benchmark command (ID-104):
hatch run bench-latency-matrixruns rtt20/rtt50/rtt100 profiles sequentially. Cross-platform Python script with configurable--profiles,--pool-size,--bench-timeout. -
Seekable read and cache benchmarks (ID-103 Phase 4):
test_seekable.pymeasuresread_seekable()cost (open+read, sequential chunks, random seeks) across backends with different seek strategies.test_cache.pymeasures CachedStore cold read (miss) vs warm read (hit) vs uncached baseline. -
Benchmark charts and user-facing report (ID-103 Phases 2--3): SVG chart generation (
hatch run bench-charts) for overhead %, overhead vs RTT, and throughput by file size. User-facing verdict report (hatch run bench-report-user) classifying overhead as Negligible/Moderate/Visible/Favorable. Performance guide reframed to lead with the answer. README gains a Performance section. -
Toxiproxy latency simulation for all backends (ID-103 Phase 1): Toxiproxy now proxies all three network backends (MinIO, Azurite, SFTP). New
--network-profileflag with named profiles (clean,rtt20,rtt50,rtt100). News3-latencyandsftp-latencybackend params alongside the existingazure-latency. -
ProxyStore added to API reference (ID-101):
ProxyStoreis now exported fromremote_storeand documented. It remains an internal delegation base by design (ADR-0014) but is visible in the inheritance chain ofObservedStoreandCachedStore, and useful for building custom Store extensions.
Fixed¶
-
Publish workflow no longer runs full CI suite (BK-132): Removed redundant lint/typecheck/test jobs from
publish.yml— master branch protection already gates these. Publish now only builds, checks, and uploads. Fixes Python 3.10 dependency resolution failure caused bypytest-gremlins>=1.5(requires 3.11+). -
MemoryCache.size()no longer rebuilds dict (BK-127 L-1): Replaced dict comprehension withsum()generator — avoids transient 2× memory spike on large caches. Trade-off:size()no longer evicts expired entries as a side-effect; they remain in_datauntil the nextget(),clear_prefix(), orclear_prefixes()call. -
Replaced mypy
ignore_missing_importsoverrides with proper type stubs (BK-015): Removed 8[[tool.mypy.overrides]]entries that suppressed import errors for packages shippingpy.typedor having PyPI stubs (pydantic,pydantic_settings,tomli,tomllib,ruamel.yaml,requests,urllib3,httpx). Addedtypes-requeststo dev dependencies. Cleaned up now-unnecessary# type: ignorecomments in_http_requests.pyand_http_httpx.py. Mypy now sees real types instead ofAnyfor these imports. -
SFTP TOFU host key persistence (BUG-005):
TRUST_ON_FIRST_USEnow persists accepted host keys to disk on disconnect, creating the known_hosts file and parent directories if absent. Inline keys (code/config/env) are never persisted. Spec SFTP-028. -
Cache coherency in move/copy operations (BUG-006):
CachedStore.move()andCachedStore.copy()now clear the entire cache (instead of selective invalidation) to prevent stale cached entries for nested paths that are relocated or overwritten. Consistent withdelete_folder()safety strategy. Spec CACHE-010 updated. -
Snippet indentation in docs code blocks (BUG-004): named snippet regions inside function bodies rendered with extra leading whitespace. Fixed via pymdownx.snippets
dedent_subsectionsoption.
Changed¶
-
S3 recursive listing memory optimization (BK-123 H-1/H-2):
list_files(recursive=True)andget_folder_infonow use paginated per-directoryls()calls instead offind(), reducing peak memory from O(total objects) to O(widest directory). -
MemoryBackend listing lock reduction (BK-123 M-3/M-4/M-5):
list_files,list_folders, anditer_childrennow snapshot state under lock and build results lazily outside it, reducing lock contention during long iterations. -
MemoryBackend write memory optimization (BK-123 M-6): Stream writes accumulate directly into a
bytearrayvia chunked reads, halving peak memory. -
CachedStore pre-flight size check (BK-123 M-2):
read_byteschecks cachedget_file_infosize before reading to skip caching oversized files earlier. Zero extra backend calls. -
Performance messaging rewrite (ID-104): README and performance guide now present overhead as measured values in ms (with percentages in brackets) instead of judgmental language. Users see the numbers and decide for themselves.
-
Seekable read promoted to Store API (ID-100, ID-102): New
Store.read_seekable()method — always returns a seekable stream, backend-optimized. On seekable backends (Local, S3, SFTP) it's zero-overhead passthrough. On Azure it returns_AzureRangeReader(HTTP Range requests per read — ideal for PyArrow column pruning). On HTTP it spools toSpooledTemporaryFile. Replacesext.seekable.seekable_read()(removed, never released). ADR-0017 supersedes ADR-0016. Spec 036 revised.
Removed¶
- Deprecated function aliases removed (BK-130):
cached_store(),remote_store_io_manager(), andpydantic_to_registry_config()are removed. Usecache(),dagster_io_manager(), andfrom_pydantic()respectively. The_deprecated_alias()helper inext/_helpers.pyis also removed. Pre-v1 — no deprecation shim needed.
Documentation¶
-
Fix docs list completeness findings (BK-129): Add SQLBlob and SQLQuery backends to all backend lists, tables, and matrices across 14 doc files. Remove ghost "Seekable read" entries from extension lists. Add missing extensions to architecture.md. Add
read_seekable()directive to Store API reference. Addsqlandsql-queryextras to README installation section. -
RFC-0008: Parquet Dataset Storage extension (ID-122): Draft RFC proposing
ParquetDatasetStore— high-level Parquet dataset read/write with manifests,_SUCCESSmarkers, and atomic-commit semantics on top of existing Store primitives. -
S3 listing strategies and performance (ID-113): New comprehensive guide in
guides/backends/s3.mdexplaining shallow vs. recursive listing, why flatListObjectsV2streams beat delimiter-based folder iteration, and why parallelization is wrong for large buckets. Includes performance data and examples showing when to use each approach. New example fileexamples/backends/s3_listing_strategies.pydemonstrates shallow, recursive, and filtered listing patterns.
Internal¶
-
CI test quality gates (BK-126): AST-based assertion checker (
scripts/check_test_assertions.py) and MagicMock spec checker (scripts/check_mock_spec.py) now run in CI lint job. Rules 1 and 4 fromsdd/TESTING.mdare machine-enforced. -
MagicMock
spec=migration (BK-126): All 67 unconstrainedMagicMock()calls now usespec=with the correct class, preventing mocks from silently accepting invalid attribute access. -
Assertion migration (BK-126): 87 test functions that lacked explicit
assertorpytest.raisesnow have meaningful post-condition assertions. -
pytest-gremlins integration (BK-126): Added
pytest-gremlins>=1.5for mutation testing. New hatch scripts:check-test-quality,test-cov-branch(branch coverage diagnostic). No CI threshold yet. -
Fix mutation testing scripts (BK-131): Replaced broken
mutate/mutate-reportscripts with 6 scoped scripts (mutate-core-api,mutate-core-infra,mutate-ext-proxy,mutate-ext-format,mutate-backends-local,mutate-backends-cloud). Original scripts passed source dir as positional arg instead of--gremlin-targets. Scoped runs avoid Windows command-line length limits. Added[tool.pytest-gremlins]config with incremental caching enabled. -
Eliminate avoidable
type: ignorecomments (BK-016): Replaced 9no-any-returnsuppressions withcast()inext/cache.py(6) and_stream.py(3). 1miscin_path.pykept (mypyFinalon__slots__limitation). -
Document
list()materialisation in concurrent batch helpers (BK-127 L-2). -
Clarify module-level sqlalchemy import rationale (BK-127 L-3).
-
Ruff PT rules enabled (BK-124b):
flake8-pytest-styleenforced inpyproject.toml. 152 auto-fixed, 13match=added topytest.raises, 9 intentional PT012 suppressed. Ruff PT section in TESTING.md marked enabled. -
Multi-agent orchestration skill (BK-125):
/orchestrateskill delegating to 4 domain experts (Store & Backend, Extension, Testing, Documentation) via Claude Code Agent tool. Two modes: implementation and review. ADR-0019 documents the architecture decision. -
Orchestrate v2: iterative convergence model (BK-128): Redesigned
/orchestratefrom single-pass parallel to iterative convergence with three complexity modes (Simple, Standard, Complex). Adds plan refinement with experts (1 round), consolidation step, review loop (max 2 rounds), and user as tie-breaker. ADR-0020 supersedes ADR-0019. -
Testing standards guide (BK-124a): New
sdd/TESTING.mdcodifying 8 test quality rules from research-testing-best-practices. Companion to DESIGN.md § 11 (style). Includes Testing Expert quick reference for BK-125. -
RFC-0009: Multi-agent orchestration (BK-125): Draft RFC proposing orchestrator + 4 subject matter experts for complex multi-concern tasks. Claude Code native (Agent tool) approach. No code change — process only.
-
Test coverage and ResourceWarning fixes: SQLBlob test fixtures now dispose engines on teardown (ResourceWarning eliminated). ProxyStore delegation coverage 68% → 100% (new
test_proxy.py). SQLAlchemy backend coverage 90% → 99% (_glob_to_like, optional columns, health check)./prskill now gates onhatch run test-cov(95% threshold) before creating PRs.
[0.19.0] - 2026-03-23¶
Changed¶
- Renamed ext factory functions for naming consistency (BK-010):
pydantic_to_registry_config()→from_pydantic()— matches thefrom_*pattern used byfrom_yaml,from_dict,from_toml.remote_store_io_manager()→dagster_io_manager()— drops redundantremote_store_prefix, matchespyarrow_fspattern.cached_store()→cache()— bare verb, matchesobserve().- Old names remain as deprecated aliases emitting
DeprecationWarning.
Documentation¶
-
Single-source code snippets for docs (ID-057): docs code blocks are now pulled from tested Python files in
examples/snippets/via pymdownx.snippets named regions. CI runs snippet scripts to guarantee they stay valid. -
Auto-generated example doc wrappers (ID-058):
scripts/gen_pages.pynow scansexamples/*.py, extracts the module docstring, and generates wrapper pages + index + nav entries automatically. Eliminates the class of "forgot to add a wrapper" bugs. Addedtests/test_api_coverage.pyto verify every__all__symbol has API documentation. -
Cross-link compliance pass (BK-013):
## See alsosections added to all 27 example pages and all API reference pages. Backend names in capability matrices, choosing-a-backend, concurrency, health-check, performance, and API reference tables now link to their respective guide pages. Added Rule 4 ("Table header/key-column → documented entity") toDOCUMENTATION.md§ 4. -
Docstring and API doc fixes: replaced private-module imports with public API paths in docs, completed extensions table, fixed Sphinx-style remnants.
Internal¶
-
S3 backend code deduplication (BK-011): extracted
_S3Basebase class,_fileinfohelpers, and error factories from the two S3 backends. Net −94 lines, single maintenance point for 155 previously duplicated lines. -
Extension code deduplication (BK-012):
_StreamWrapperbase class inext/streams.py, generic_run_batch()executor inext/batch.py,_deprecated_alias()helper inext/_helpers.py. -
Test suite deduplication and parametrization (BK-014): refactored 30 of 40 test files (~17,800 → ~16,300 lines, −8.6%) while preserving identical coverage. Parametrized similar tests, extracted shared fixtures, merged single-method classes, and consolidated repeated assertion patterns.
-
SDD document category consolidation (ID-099): merged
proposals/→rfcs/andplans/→research/, reducing SDD categories from 7 to 5. Added Document Types table to000-process.md. -
Fixed compound-command PreToolUse hook: replaced
jq(not installed) with Python for JSON parsing. Also blocksgit -Cpattern.
[0.18.0] - 2026-03-18¶
Added¶
- S3 backend now populates
FileInfo.digestfromx-amz-checksum-*—get_file_infocallsHeadObjectwithChecksumMode: ENABLEDunconditionally, returning both metadata and any checksum headers in a single request. The base64-encoded checksum is converted to a hexContentDigest. Listing paths (list_files,iter_children) still returndigest=Noneto avoid per-file overhead. (ID-098, S3-024) - S3 backend now populates
FileInfo.etag—_info_to_fileinfostrips the double-quoted S3 ETag and stores it as a lowercase string. (ID-096, S3-023) -
Azure backend now populates
FileInfo.etagandFileInfo.digest—_props_to_fileinfostrips and lowercases the Azure blob ETag (etag), and convertscontent_settings.content_md5bytes to aContentDigest("md5", hex)when the blob was uploaded with Content-MD5 set. (ID-097, AZ-034) -
ContentDigestfrozen dataclass — immutable model withalgorithm: strandvalue: str(both lowercase-normalized, validated). Conveniencecontent_digest()function inext.integrity. (ID-095, CDG-001–CDG-003) FileInfo.digestandFileInfo.etagfields —digest: ContentDigest | Nonefor verified checksums,etag: str | Nonefor opaque server tags.FileInfo.checksumis removed (pre-1.0, no deprecation shim). (ID-095, CDG-004)ext.streamsmodule — composableBinaryIOwrappers for progress tracking and checksum computation:ProgressReader,ProgressWriter,ChecksumReader,ChecksumWriter,read_with_progress(). Stream-level primitives that compose with anyBinaryIO, including fromopen_atomic(). (ID-092)ext.integritymodule — pure functions for checksum verification over Store's public API:checksum(),verify(),verify_hex(). (ID-093)ProxyStorebase class — shared delegation base forObservedStoreandCachedStore. Centralizes private-attribute coupling, provides default delegation for all Store methods, and enableschild()propagation. Internal only — not part of the public API. (ID-094, ADR-0014)- HTTP backend: HEAD fallback for CDN-blocked servers — when
HEADreturns 401/403,exists(),get_file_info(), andcheck_health()retry withGET+Range: bytes=0-0. The result is cached for the backend's lifetime. Discovered during live testing against CDN-fronted endpoints. (ID-085) @pytest.mark.os_sensitiveCI marker — macOS and Windows CI jobs now run only tests that exercise OS-specific behaviour (path separators, atomic writes viaos.replace, local filesystem operations). Network-protocol backends (HTTP, S3, SFTP) are Linux-only. Reduces cross-platform CI time significantly. (ID-087)- Medallion + Dagster showcase (
examples/medallion_dagster/) — self-contained Dagster project demonstrating 4 extensions composing over live MeteoSwiss weather data in a Bronze/Silver/Gold medallion architecture. UsesReadOnlyHttpBackend,ext.cache,ext.otel, andext.dagster. (BK-008) - Read-only HTTP backend (
ReadOnlyHttpBackend) — read files from HTTP/HTTPS URLs. Capabilities:{READ, METADATA}. Zero runtime dependencies (uses stdliburllib); optionalrequestsandhttpxtransports via extras for connection pooling. Install withpip install "remote-store[requests]"orpip install "remote-store[httpx]". (ID-082) - Conformance suite capability gates — WRITE, DELETE, LIST, MOVE, COPY capabilities are now gated in the backend conformance suite, enabling testing of partial-capability backends.
ext.dagster— Dagster IO Manager adapter (ID-075 v1) — wraps any existingStoreas a DagsterIOManagerviaremote_store_io_manager(). Pluggable serialization (pickle, JSON, Parquet). Install withpip install "remote-store[dagster]". Spec031-ext-dagster.md(DAG-001 through DAG-011).
Changed¶
ext.transfer.download()now usesProgressReaderwrapper — progress tracking indownload()is now consistent withupload()andtransfer(), using theProgressReaderstream wrapper instead of an inline callback. (ID-006, XFER-009)ext.transfernow uses publicProgressReaderfromext.streamsinstead of its private_ProgressReader. No public API change. (ID-091)ObservedStoreandCachedStorenow extendProxyStore— reduces boilerplate, centralizes delegation, and removes duplicated init coupling.
Fixed¶
child()now propagates proxy behavior inObservedStoreandCachedStore. Previously,cached_store(s).child("sub")returned a plainStore, silently losing caching/observation. (BUG-003)pydantic_to_registry_config()now unwrapsSecretStrfields — PydanticSecretStrvalues in backendoptionsdicts are automatically converted to plain strings before reachingfrom_dict(), so sensitive-key detection wraps them inSecretcorrectly. Previously,SecretStrobjects bypassed theisinstance(v, str)check and were not wrapped.
Documentation¶
- Backend API reference pages (ID-088) — added class documentation for all 7 backends (Local, Memory, HTTP, S3, S3-PyArrow, SFTP, Azure) under a new "Backends" section in the API reference. Each page links to the corresponding backend guide.
- Extensions API reference section (ID-089) — moved all 11 extension API pages into a nested "Extensions" section with an index page, matching the Backends section structure.
- Docs landing page (ID-090) — replaced the 1:1 README include with a purpose-built orientation page: architecture diagram, six key messages, quick start, and navigation links.
Removed¶
- Top-level re-exports of optional-dependency extensions (ADR-0013) —
from remote_store import pyarrow_fsand similar shortcuts for arrow, otel, pydantic, and yaml extensions are removed. Use the canonical import path instead:from remote_store.ext.arrow import pyarrow_fs. Pure-Python extensions (batch, cache, glob, observe, partition, transfer) are unchanged.
[0.17.0] - 2026-03-14¶
Added¶
-
AzureBackend(max_concurrency=)parameter (ID-076) — controls parallel connections for blob uploads and downloads. Default1(sequential, matching prior behavior). Set higher for improved throughput on large files. -
FolderInfo.nameproperty (ID-079) — derived@propertyreturning the final path component (self.path.name).FolderInfonow satisfies thePathEntryprotocol alongsideFileInfoandFolderEntry. -
FolderEntrydataclass andPathEntryprotocol (ID-072) —FolderEntryis an immutable identity object returned by listing operations with.nameand.pathattributes.PathEntryis a runtime-checkable protocol satisfied by bothFileInfoandFolderEntry, enabling uniform iteration. -
Store.write_text()convenience method (ID-074) — writes a string to a file with configurable encoding. Wrapswrite()withencodingandoverwriteparameters matchingpathlib.Path.write_text(). Store-level only (no backend changes).ext.observeon_writehook,ext.cacheroutes throughwrite. Spec030-write-text.md(WTXT-001 through WTXT-006).
Changed¶
-
Docstrings migrated from Sphinx to Google style (ID-080) — all 367 Sphinx-style markers (
:param:,:returns:,:raises:) across 25 source files converted to Google-style sections (Args:,Returns:,Raises:).mkdocs.ymlupdated todocstring_style: google.sdd/DESIGN.md§4 updated with the new convention. Unlocks inline admonitions and markdown cross-references inside docstrings. -
S3 listing methods no longer call
exists()before listing (ID-062) — removes a redundant API round-trip fromlist_files,list_folders, anditer_childreninS3BackendandS3PyArrowBackend. The existingFileNotFoundErrorhandler already covers non-existent paths. -
list_folders()returnsIterator[FolderEntry](ID-072) — wasIterator[str]. Use.namefor the folder name,.pathfor the full path. -
iter_children()returnsIterator[FileInfo | FolderEntry](ID-072) — wasIterator[FileInfo | str]. Useisinstance(entry, FolderEntry)instead ofisinstance(entry, str)to distinguish folders from files. -
Store docstring rewrite (ID-074) — rewrote all Store method docstrings for accuracy and consistency. Fixed
write/write_atomicstr claim, correctedread_texterrors reference. -
store.mdrestructured with per-method:::directives (ID-074) — individual method headings, admonitions for ordering, atomicity, metadata, and thread-safety. Added backend behavior matrix verified against backend source.
Docs¶
-
README medium pass (ID-081) — streamlined onboarding flow, added backend behavior matrix, restored correct extras and library names, fixed method count (27).
-
Docs site polish (ID-064) — property return types now visible (
show_signature_annotations), Fira Code font for code blocks, sticky navigation tabs, search suggest/highlight, tighter parameter list spacing, capability matrix icons.
[0.16.0] - 2026-03-10¶
Added¶
-
Store.read_text()convenience method (ID-056) — reads a file and decodes to string. Wrapsread_bytes()withencodinganderrorsparameters matchingpathlib.Path.read_text(). Store-level only (no backend changes).ext.observeon_readhook,ext.cacheroutes through cachedread_bytes. Spec028-read-text.md(RTXT-001 through RTXT-006). -
Store.iter_children()combined listing (ID-055) — yields both files (FileInfo) and folders (str) in a single pass, avoiding two round-trips. All 6 backends override with single-call implementations.ext.observeon_listhook,ext.cachecaching and invalidation. Spec027-iter-children.md(ITER-001 through ITER-008). -
Store.ping()health check (ID-054) — lightweight, non-destructive backend connectivity verification. Delegates toBackend.check_health(). Per-backend strategies: Local (exists+os.access), S3 (head_bucket), S3-PyArrow (get_file_info), SFTP (stat), Azure (get_container_properties), Memory (no-op).ext.observeon_pinghook. Spec026-health-check.md(PING-001 through PING-010). -
RetryPolicydataclass (ID-010) — unified retry configuration for transient backend errors. Frozen dataclass withmax_attempts,backoff_base,backoff_max,jitter, andtimeoutfields. Each backend maps the policy to its native retry mechanism: SFTP (tenacity), S3 (botocore), Azure (ExponentialRetry), S3-PyArrow (PyArrow C++ + botocore).RetryPolicy.disabled()factory for single-attempt mode. Configurable via constructor (retry=RetryPolicy(...)) or dict config ("retry": {"max_attempts": 5}). ADR-0011, spec025-retry-policy.md. -
SFTPUtilsutility class — groupsload_private_keyandHostKeyPolicyinto a public re-export (from remote_store.backends import SFTPUtils). Replaces privatebackends._sftpimports in user-facing code.
Changed¶
-
Authoritative docs restructured to ADF standard (ID-059) —
sdd/DESIGN.mdtrimmed to code style conventions only (sections 1-10 removed, duplicated specs).sdd/DOCUMENTATION.mdcondensed to rules + guides (~130 lines from ~456).sdd/000-process.mdrestructured to Intent/Rules/Guides (~75 lines from ~152). Audit files moved tosdd/audits/.CONTRIBUTING.mdspec format section replaced with cross-ref to000-process.md.CLAUDE.mdenvironment note removed, gh CLIForbidden operationsdenylist replaced with ask-gated confirmation. -
from_yaml()moved fromRegistryConfigclassmethod toext/yaml.py(ID-002) YAML config loading requires an optional dependency (pyyamlorruamel.yaml), same as the Pydantic adapter. Moved toext.yamlfor consistency with the extension architecture (ADR-0008). Import changes:from remote_store.ext.yaml import from_yaml.
Docs¶
-
RTD docs now default to stable release — changed all docs deep links in user-facing files (README, guides) from
/en/latest/to/stable/, dropping the/en/language prefix (single-language project) and pointing to the most recent PyPI release instead of unreleased master. Updated DOCUMENTATION.md canonical URL policy and CONTRIBUTING.md release checklist. Requires RTD admin: default version =stable, URL versioning scheme =/version/path/. -
README API table audit — added missing
iter_children()to Browse & Inspect section, added 5 missing example scripts to Examples table (caching,config_loaders,capabilities_and_errors,path_model,retry_policy), addedext.yamlto Extensions table, updated method count from 23 to 26 in comparison table, fixed stale PyArrownative_path()limitation note. Addedext-yaml.mdAPI reference page and nav entry. -
Audit 003 fixes (AF-022 through AF-040) — documentation quality audit follow-up. 16 findings fixed, 3 closed as non-defects. Key changes: 7 missing example doc pages added, observe hook table completed (
on_ping,open_atomic), private imports replaced with public API in 4 guides,CacheBackendprotocol docstrings added, CONTRIBUTING.md spec listing simplified (no longer goes stale), mkdocstringsshow_if_no_docstring: falsefor proxy class overrides.
[0.15.0] - 2026-03-08¶
Added¶
hatch run notebookssmoke-test runner (ID-048) — lightweight script (tests/scripts/run_notebooks.py) that executes tutorial notebook code cells viaexec()without requiring Jupyter. Wired intohatch run alland CIexamplesjob. Skipsbenchmark_analysis.ipynb(needs pre-generated data).Store.open_atomic(path, overwrite=False)— context manager for streaming atomic writes (ID-026, SAW-001 through SAW-015). Yields a writable file object backed by a temporary location; on successful exit the file is atomically promoted to the target path, on exception the temporary artifact is cleaned up. Eliminates the memory-buffering requirement ofwrite_atomic()for large files. All 6 backends supported.Backend.open_atomic(path, overwrite=False)— new abstract method on the Backend ABC. Per-backend temp-path strategies:mkstemp+os.replace(Local),.~tmp.*+posix_rename(SFTP),SpooledTemporaryFile+PUT (S3, S3-PyArrow, Azure non-HNS), temp blob+DFS rename (Azure HNS),BytesIObuffer (Memory).- Data lake medallion notebook (
examples/notebooks/04_data_lake_medallion.ipynb) — end-to-end Bronze/Silver/Gold pipeline usingStore.child(), PyArrow, Polars, and DuckDB. Generates ~3,500 sensor readings with realistic quality issues, cleans through medallion layers, and runs analytical queries on gold. Runs entirely onMemoryBackend. Store.native_path(key)— converts a store-relative key to the backend-native path (STORE-015). Inverse ofto_key(). Used by the PyArrow adapter for Tier 1 fast-path reads.Backend.native_path(path)— converts a backend-relative key to the backend-native path (BE-025). Default is identity;S3PyArrowBackendprepends bucket prefix.- PyArrow adapter Tier 1 native fast-path reads (ID-037, PA-010) —
StoreFileSystemHandlernow probes for a native PyArrow filesystem at construction viastore.unwrap(pyarrow.fs.FileSystem). When available (e.g.,S3PyArrowBackend),open_input_filedelegates directly to the native FS, bypassing Python I/O for zero GIL overhead with C++ range requests and I/O coalescing. S3PyArrowBackend.unwrap()now acceptspyarrow.fs.FileSystembase class in addition topyarrow.fs.S3FileSystem.- Parallel batch operations (ID-035) —
batch_delete,batch_copy, andbatch_existsnow acceptconcurrent=Trueandmax_workers=Nkeyword arguments for parallel execution viaThreadPoolExecutor. Cloud backends benefit significantly from concurrent I/O over sequential execution.stop_on_erroris incompatible withconcurrent=True(raisesValueError). Spec: BATCH-020 through BATCH-025. ext.cache— store-level caching middleware (ID-025) —cached_store(store, ttl=300)wraps a Store in a proxy that caches read-only operations (exists,is_file,is_folder,read_bytes,get_file_info,get_folder_info,list_files,list_folders,glob) with TTL-based expiration. All mutating operations automatically invalidate affected entries.max_content_sizelimits memory for large files. Thread-safe. Spec: CACHE-001 through CACHE-015.ext.partition— Hive-style partition path helpers (ID-036) —partition_path(filename, **partitions)builds paths likeyear=2026/month=03/data.parquet,parse_partition(path)extracts the partition dict and filename. Pure Python, zero dependencies. Spec: PART-001 through PART-013.
Documentation¶
- Documentation overhaul (DOC-001) — Diataxis nav restructure (Getting Started / Guides / Reference / Explanation), extension API reference pages for all 9 ext modules, 7 new content pages (capabilities matrix, choosing a backend, troubleshooting, migration, architecture overview, security model, further reading), research docs surfaced on site, docstring audit for Store/Backend/errors with complete
:param:/:returns:/:raises:and examples, cross-links between guides and API reference pages.
[0.14.0] - 2026-03-07¶
Changed¶
_stacklevelremoved from publicfrom_dict()signature (ID-043) Internal_stacklevelparameter no longer leaks into the publicRegistryConfig.from_dict()API. Warning stack-level control is now handled via a private_from_dict()helper.
Fixed¶
-
Registry.get_store()no longer owns the shared backend (ID-041) Stores returned byget_store()now set_owns_backend = False, preventing a store'sclose()from shutting down the cached backend and breaking sibling stores.Registry.close()remains the lifecycle owner. -
Store.move()andStore.copy()short-circuit whensrc == dst(ID-040) Moving or copying a file to itself is now a uniform no-op across all backends. Source existence is verified viais_file()(notexists()), so folders at the source path correctly raiseNotFound. Spec: STORE-008a.
Added¶
-
Data lake patterns guide (ID-034) New guide (
guides/data-lake-patterns.md) documenting Bronze/Silver/Gold medallion architecture usingStore.child()+ext.arrow+ext.transfer. Covers PyArrow, Polars, DuckDB, Delta Lake integration, batch partition operations, cross-backend transfer, and testing without cloud credentials. Includes honest assessment of where remote-store fits vs. Databricks/Spark. -
Credential hygiene documentation (ID-042) Added "Credential hygiene" section to README and updated
examples/configuration.pywithSecretwrapping,from_dict()auto-wrapping, and.reveal()usage. -
RegistryConfig.from_toml()— TOML config loader (ID-005) Load config from a standalone.tomlfile or frompyproject.tomlviatable=("tool", "remote-store"). Zero dependencies on Python 3.11+; optionaltomlibackport for 3.10. Spec: CFG-008, CFG-009. -
RegistryConfig.from_yaml()— YAML config loader (ID-002) Load config from a YAML file. Acceptspyyaml(primary) orruamel.yaml(fallback). Spec: CFG-010, CFG-011. -
Unknown top-level key warning in
from_dict()(CFG-012)from_dict()now emitsUserWarningfor unrecognized keys like"backend"(typo for"backends"), preventing silently empty configs. -
pydantic_to_registry_config()— Pydantic adapter (ID-003) Convert any PydanticBaseModelorBaseSettingsinstance to aRegistryConfigviamodel_dump() → from_dict(). Supports env-var binding,.envfile loading, and validation viapydantic-settings. Optionalpydanticextra. Spec: CFG-015, CFG-016, CFG-017.
[0.13.0] - 2026-03-03¶
Added¶
-
Secretwrapper and credential hygiene (ID-039, SEC-001 through SEC-008)Secrettype in_config.pywraps sensitive credential strings:repr()andstr()return'***',.reveal()returns the plain value.RegistryConfig.from_dict()auto-wraps known sensitive keys (key,secret,password,account_key,sas_token,connection_string). All backends acceptstr | Secretfor credential params via_reveal(). SFTP coerceshost_key_policystrings toHostKeyPolicyenum.SecretRedactionFilterlogging filter scrubsSecretinstances from log record args. Spec:sdd/specs/020-credential-hygiene.md. -
Intrinsic stdlib logging (ID-004, OBS-008) Core modules and extensions now use
log = logging.getLogger(__name__)withNullHandleron the"remote_store"root logger. DEBUG for method entry, INFO for write/delete/move/copy completion. Structuredextra={}withop,path,backendkeys. Existing logger names standardised (_log->log,logger->log). -
ext.observe— observability hooks (ID-024, ADR-0010, OBS-001 through OBS-010)observe(store, on_read=..., on_write=..., on_any=..., around=...)wraps a Store in anObservedStoreproxy that fires callbacks after each operation.StoreEventfrozen dataclass carries operation, path, backend, timing, error, and metadata.BufferedObserverqueues events for batched delivery on a background thread. Drift-protection test ensures new Store methods cannot silently bypass observation. Spec:sdd/specs/019-ext-observe.md. -
ext.otel— OpenTelemetry bridge (ID-024, OBS-011 through OBS-014) Pre-built hooks that emit OpenTelemetry spans and metrics.otel_observe(store)wraps a Store with distributed tracing (store.{op}spans withCLIENTkind) and three metric instruments (operations counter, errors counter, duration histogram). Depends only onopentelemetry-api(zero-cost no-ops without SDK). New optional extra:pip install "remote-store[otel]". Spec:sdd/specs/019-ext-observe.md(OBS-011--OBS-014).
Fixed¶
get_folder_info("")crashed withInvalidPathfor root folders (BUG-001) AddedRemotePath.ROOTclass-level sentinel that bypasses__init__validation (str(ROOT) == "."). Fixed all 6 backends and_rebase_folder_infoto returnRemotePath.ROOTfor root-level queries. Store methods now accept"."as a root alias so thatstr(folder_info.path)round-trips correctly. Spec:sdd/specs/004-path-model.md(PATH-015).
[0.12.0] - 2026-03-01¶
Added¶
- S3, S3-PyArrow, and Azure native glob (BK-002, ID-007, GLOB-018/019/020)
All cloud backends now override
Backend.glob()with prefix-optimized listing and client-side regex filtering. Local, S3, S3-PyArrow, and Azure backends now declareCapability.GLOB. Shared glob helpers extracted to internal_glob.pymodule.
[0.11.0] - 2026-03-01¶
Added¶
- Glob pattern matching — three-tier design (ADR-0009) (BK-002, ID-007)
- Tier 1:
list_files(pattern=…)— universalfnmatchname filtering, works with every backend (needs onlyLIST) - Tier 2:
Store.glob()/Capability.GLOB— native backend glob, capability-gated (likeunwrap()).LocalBackendimplements viapathlib - Tier 3:
ext.glob.glob_files()— portable full-glob fallback with**recursive patterns and[abc]/[!abc]character classes; delegates to native glob when available, otherwiselist_files+ client-side regex
Changed¶
- Beta status. Project classifier changed from Alpha to Beta. Core API (Store, Registry, Backend, models, errors) is now considered stable. See CONTRIBUTING.md § Stability tiers.
[0.10.0] - 2026-02-28¶
Added¶
- Extension namespace contract (ADR-0008) — formalized the
ext.*namespace contract: public API only, no lifecycle ownership,CapabilityNotSupportedpropagation, export rules for pure-Python vs optional-dependency extensions, development lifecycle, and third-party naming convention. Added extensions guide, expanded CONTRIBUTING.md checklist, contract enforcement tests, updated CLAUDE-REFERENCE.md ripple-check table (ID-027)
Changed¶
- S3-PyArrow read path optimization — removed
BufferedReaderfromS3PyArrowBackend.read(), addedread()+ chunkedreadline()to_PyArrowBinaryIO, eliminating double-copy overhead on streaming reads (ID-031, RFC-0003) - Benchmark tiered modes, backend filtering, and comparative docs — replaced binary
slow/not-slow split with three tiers (quick/standard/full), added--backendfilter for single-backend runs (deselects instead of skipping to avoid fixture setup), added--bench-timeoutwatchdog (Windows-compatible), added--comparativeand--markdownmodes toreport.pyfor remote-store vs raw SDK vs fsspec comparison tables, updated hatch scripts. Comparative results and performance guide now populated with measured Docker benchmark data across 4 backends (ID-020) - Release CI: GitHub Release as single trigger —
publish.ymlnow triggers onrelease: types: [published]instead ofpush: tags: ["v*"]. The GitHub Release becomes the single event that triggers PyPI publish (ID-028) - Versioned documentation with mike —
docs.ymlsplit into two jobs:deploy-dev(master push deploys "dev" alias) anddeploy-release(release published deploys versioned docs with "latest" alias). Version switcher dropdown added to docs site. Requires changing GitHub Pages source to "Deploy from a branch" (gh-pages) (ID-029)
[0.9.0] - 2026-02-28¶
Added¶
- Transfer operations (
ext.transfer) —upload,download, andtransferfunctions for moving data between local files and Stores or between two Stores. All streaming (never loads full file into memory), with optionalon_progresscallback per chunk.uploadstreams a local file to a Store,downloadreads in 1 MiB chunks to a local file,transferpipes between any two Stores. Supportsoverwriteflag. Pure Python, no extra dependencies, unconditional top-level export (ID-023, unifies ID-001 + ID-009) - Batch operations (
ext.batch) —batch_delete,batch_copy, andbatch_existsconvenience functions for operating on collections of paths. Sequential execution with error aggregation viaBatchResult(succeeded/failed split). Supportsstop_on_error,missing_ok, andoverwriteoptions. Pure Python, no extra dependencies, unconditional top-level export (ID-022) - PyArrow FileSystem adapter (Phase 1) —
StoreFileSystemHandlerwraps anyStoreinto apyarrow.fs.PyFileSystem, enabling seamless interop with PyArrow datasets, Pandas, Polars, DuckDB, PyIceberg, and Delta Lake. Includespyarrow_fs()convenience factory,_StoreSinkwrite buffer with spill-to-disk, tiered read strategy (Tier 2 BufferReader for small files, Tier 3 PythonFile for large seekable files), complete error mapping (PA-019/020), and conditional top-level export. Install withpip install "remote-store[arrow]". Tier 1 native fast-path deferred to Phase 2 (ID-016) Store.unwrap(type_hint)— delegates toBackend.unwrap(), exposing the backend's native handle through the public Store surface. Used by the PyArrow adapter and available to all callers (STORE-013)- Concurrency and atomicity guide — new
guides/concurrency.mddocumenting TOCTOU race onoverwrite=False(all backends) and non-atomicmove()(S3, S3-PyArrow, Azure non-HNS, SFTP fallback), with per-backend summary table and practical workarounds. Cross-referenced from all backend guides (AF-010) - Capability gating tests — 14 tests verifying all 12 Store methods that require a capability raise
CapabilityNotSupportedwhen the backend lacks it, with correct.capabilityattribute value and backend name propagation (AF-012, STORE-006) - S3 and SFTP error path tests — mock-based tests for
PermissionDenied(S3-016: HTTP 403/accessdenied, SFTP-021:errno.EACCES),AlreadyExists(SFTP-022:errno.EEXIST), andBackendUnavailable(S3-017: endpoint/connect/timeout/dns errors, SFTP-023:paramiko.SSHException). Removedpragma: no coverfrom now-tested_classify_error/_map_exceptionbranches (AF-013) - CI gate in publish workflow —
publish.ymlnow runs lint, typecheck, and tests (Python 3.10 + 3.13) before building and publishing to PyPI, preventing broken tags from reaching the registry (AF-014)
[0.8.0] - 2026-02-27¶
Added¶
Store.child(subpath)— runtime sub-scoping — returns a new Store scoped to a subfolder, sharing the parent's backend instance (no new connections). Child stores do not close the shared backend onclose()or context manager exit. Validated viaRemotePath, chainable (store.child("a").child("b")), equality-transparent with directly constructed stores. Spec:015-store-child.md(ID-021)- Cloud backend examples — 5 new example scripts (
s3_backend.py,s3_pyarrow_backend.py,sftp_backend.py,azure_backend.py,store_child.py) demonstrating each backend with self-contained env-var configuration and graceful failure messages. All Store API methods now have example coverage - Claude Code reusable skills — 6 slash-command skills in
.claude/commands/codifying recurring workflows:/ripple-check(cross-reference validation),/release(6-phase release checklist),/add-backend(12-step scaffolding),/backlog-sync(backlog update helper),/pr-preflight(11-check pre-submission validation),/add-spec(SDD spec + test scaffolding) (ID-030)
Changed¶
- Release checklist expanded — replaced the 5-item release checklist in CONTRIBUTING.md with a 6-phase process covering pre-flight, content freeze, version bump, validation, ship with PR review gate, and post-release verification. GitHub Release is the intended single trigger for PyPI publish and docs deploy (ID-028, ID-029 track the CI changes)
Fixed¶
streaming_io.pyexample leaked file handles on Windows —store.read()streams were not closed beforeTemporaryDirectorycleanup, causingPermissionErroron Windows due to file locking. Streams are now used as context managers
[0.7.0] - 2026-02-27¶
Added¶
MemoryBackend— in-memory backend — tree-indexed, zero dependencies, no filesystem access. Supports all 8 capabilities with zero conformance test skips. Primary use cases: unit testing, interactive exploration, documentation examples, CI speed. Registered as"memory"backend type, always available (no optional extra). Store test fixtures migrated fromLocalBackend+tempfiletoMemoryBackend(ID-017)- PyArrow FileSystemHandler adapter spec — drafted
sdd/specs/014-pyarrow-filesystem-adapter.mdforStoreFileSystemHandlerwrapping anyStoreinto apyarrow.fs.PyFileSystem. Tiered read strategy (native fast path / BufferReader / PythonFile), spill-to-disk writes, complete error mapping (ID-016) - Backend
__repr__with credential masking — all 6 backends now implement__repr__(). Secrets display as'***'when set andNonewhen unset; identifiers (bucket, host, container) are shown in clear text (AF-008)
Changed¶
- S3/S3-PyArrow
get_folder_info()on empty folders — no longer raisesNotFound; theexists()check already gates non-existent paths. Azure non-HNS retains current behavior since virtual folders can't be empty (AF-004) Registry.close()error handling — now closes all backends even if one raises, always clears the cache, and re-raises the first error (AF-009)
Removed¶
RemoteFile/RemoteFoldermodel classes — removed dead code from models,__all__, tests, docs, and specs (AF-011)
Fixed¶
- README Azure SDK name — corrected from wrong package name to
azure-storage-file-datalake(AF-015) - CONTRIBUTING.md — added spec 012 reference (AF-015)
- Azure configuration example — added to
examples/configuration.py(AF-015)
[0.6.0] - 2026-02-25¶
Added¶
DirectoryNotEmptyerror type — newRemoteStoreErrorsubclass raised when a non-recursive folder delete targets a non-empty folder. Replaces genericNotFoundwith a descriptive error (AF-005)_ErrorMappingStream—io.RawIOBaseproxy that wraps streams returned byBackend.read(), catchingOSErrorduring lazy reads and mapping them through each backend's error classifier. Prevents native exceptions from leaking after_errors()context manager exits (AF-006)- Auto-registration of all backends —
_register_builtin_backends()now registers S3, SFTP, and S3-PyArrow backends (in addition to local and Azure) when their dependencies are installed (AF-001) - SFTP
_map_exception()method — single source of truth for SFTP error classification, used by both_errors()and_ErrorMappingStream(AF-006) - SFTP empty folder support —
get_folder_info()on an empty SFTP directory now returnsFolderInfo(file_count=0)instead of raisingNotFound(AF-004)
Changed¶
- BREAKING: Removed
Capability.GLOBandCapability.RECURSIVE_LISTenum members that had no corresponding backend methods (AF-002) - S3/S3-PyArrow
close()— no longer callsclear_instance_cache(), which was a global side-effect affecting all s3fs instances in the process (AF-003) - Azure/S3-PyArrow
read()— eliminated double-buffering by wrapping_ErrorMappingStreamdirectly inBufferedReaderinstead of nesting twoBufferedReaderlayers
Fixed¶
- Lazy stream error mapping —
OSErrorraised duringstream.read()afterBackend.read()returns is now properly mapped toRemoteStoreErrorsubtypes instead of leaking as raw exceptions (AF-006) - Exception chaining — stream error mapping uses
from excto preserve original traceback for debugging
[0.5.0] - 2026-02-23¶
Added¶
- Azure backend (
AzureBackend) — new built-in backend for Azure Blob Storage and ADLS Gen2 usingazure-storage-file-datalakedirectly. Adapts at runtime to Hierarchical Namespace (HNS) accounts for atomic rename and real directories, while remaining fully functional on plain Blob Storage. Install withpip install "remote-store[azure]". (BK-001, spec 012) - Streaming reads for Azure —
read()returns a forward-only streamingBinaryIOvia_AzureBinaryIOadapter wrappingStorageStreamDownloader.chunks(), consistent with other backends - Azurite CI integration — Azure backend tests run against Azurite Docker emulator in CI
- Azure backend guide —
guides/backends/azure.mdwith installation, auth options, HNS vs non-HNS behavior, and Azurite local development
Changed¶
- SIO-001 seekability clarification —
read()streams are not guaranteed to be seekable; seekability is a backend-level property. Callers needing seekability should useread_bytes()+BytesIO - AZ-020 spec updated — changed from BytesIO wrapper to streaming adapter
[0.4.4] - 2026-02-23¶
Added¶
- Community standards — CODE_OF_CONDUCT.md (Contributor Covenant v2.1), SECURITY.md (vulnerability reporting policy), issue templates (bug report + feature request), PR template, and CODEOWNERS
- Dependabot — automated dependency updates for pip and GitHub Actions (weekly, Mondays)
- CodeQL — GitHub code scanning workflow for Python on push/PR and weekly schedule
- Security section in README linking to vulnerability reporting
- Streaming conformance tests — 5 tests (x4 backends) that prevent regression of v0.4.3 streaming fixes: not-BytesIO assertion, chunked reads, stream position, BinaryIO write, and write-from-current-position (SIO-001, SIO-003)
[0.4.3] - 2026-02-19¶
Fixed¶
- Streaming read/write loaded entire files into memory — all four backends (Local, S3, S3-PyArrow, SFTP) now use true streaming for
read()andwrite()withBinaryIOcontent, matching the spec's streaming-first intent - SFTP copy/move buffered entire files —
copy()andmove()fallback now stream chunks using_CHUNK_SIZEinstead of loading source into memory - Broken API reference link in README — ReadTheDocs URL was missing
/en/latest/prefix, causing 404 on PyPI
Changed¶
- Versioning docs consolidated — removed outdated duplicate from
sdd/000-process.md, canonical source is nowCONTRIBUTING.md
[0.4.2] - 2026-02-19¶
Fixed¶
- PyPI relative links broken — README example scripts, notebooks, and CONTRIBUTING.md links used relative paths (
examples/quickstart.py,CONTRIBUTING.md, etc.) which resolve to 404 on PyPI; converted all to absolute GitHub URLs
[0.4.1] - 2026-02-19¶
Fixed¶
- PyPI logo broken — README image used relative path (
assets/logo.png) which doesn't resolve on PyPI's CDN; changed to absolute raw GitHub URL - Documentation site out of date — specs 010 (native path resolution) and 011 (S3-PyArrow backend) and ADR-0005 were missing from the MkDocs site and navigation
- Navigation on RTD — added
navigation.instantto MkDocs Material config so sidebar stays visible across page loads
Added¶
- PyPI version, Python versions, Read the Docs, and license badges in README
- Read the Docs publishing (
remote-store.readthedocs.io) - "Going Public" section in DEVELOPMENT_STORY.md
Changed¶
DocumentationURL inpyproject.tomlnow points to Read the Docs instead of GitHub PagesCITATION.cffURL updated to Read the Docs.readthedocs.yamlbuild OS bumped to ubuntu-24.04
[0.4.0] - 2026-02-19¶
Added¶
- S3-PyArrow hybrid backend — uses PyArrow's C++ S3 filesystem for reads/writes/copies (higher throughput for large files) and s3fs for listing/metadata/deletion. Drop-in alternative to
S3Backendwith the same constructor signature. - Install via
pip install "remote-store[s3-pyarrow]" - Spec:
sdd/specs/011-s3-pyarrow-backend.md - New optional extra:
s3-pyarrow(requiress3fs>=2024.2.0andpyarrow>=14.0.0) - Dual
unwrap()support: returns eitherpyarrow.fs.S3FileSystemors3fs.S3FileSystem
[0.3.0] - 2026-02-18¶
Added¶
Store.to_key(path)— public method to convert backend-native paths to store-relative keysBackend.to_key()— backend-level native-path-to-key conversion- Python 3.14 support — added to CI test matrix and PyPI classifiers
- PyPI publish workflow — trusted publishing (OIDC) via GitHub Actions on
v*tags (BL-001) - SFTP backend documentation —
docs/backends/sftp.mdwith installation, usage, and API reference (BL-002) - CITATION.cff — enables GitHub's "Cite this repository" button (BL-005)
- Development backlog —
sdd/BACKLOG.mdfor tracking release blockers, prioritized work, and ideas - Versioning policy added to SDD process doc (
sdd/000-process.md) - Set up GitHub Pages docs hosting via
actions/deploy-pages(BL-008)
Fixed¶
- Store round-trip bug:
list()returned backend-relative paths that includedroot_path, breaking re-use as input toread()/delete() - CI: fixed cross-platform
type: ignorecomments for S3 backend
Changed¶
- README rewritten — approachable, dev-friendly tone with scannable layout (BL-003, BL-004)
- Pinned minimum versions on public extras:
s3fs>=2024.2.0,paramiko>=2.2,tenacity>=4.0 - Removed
typing-extensionsfrom core dependencies (unused — Python 3.10+ covers all needs) - Removed
azureextra (adlfs) — no Azure backend exists yet; will be re-added with the backend
[0.2.0] - 2026-02-17¶
Added¶
- SFTP backend via pure paramiko with host key policies (STRICT / TOFU / AUTO_ADD), PEM key sanitization, and tenacity retry on transient SSH errors
- Simulated atomic writes (temp file + rename) with documented orphan-file caveat
HostKeyPolicyenum andload_private_key()utility for key management_sanitize_pem()for Azure Key Vault PEM compatibility
Changed¶
sftpoptional dependency changed fromparamiko + sshfstoparamiko + tenacity- Version bumped to 0.2.0
[0.1.0] - 2026-02-14¶
Added¶
- Store — primary user-facing abstraction for folder-scoped file operations
- Registry — backend lifecycle management with lazy instantiation and context manager support
- RegistryConfig / BackendConfig / StoreProfile — declarative, immutable configuration with
from_dict()for TOML/JSON parsing - RemotePath — immutable, validated path value object with normalization and safety checks
- Local backend — stdlib-only reference implementation with full capability support
- Capability system — backends declare supported features; unsupported operations fail explicitly
- Normalized error hierarchy —
NotFound,AlreadyExists,InvalidPath,PermissionDenied,CapabilityNotSupported,BackendUnavailable - Streaming-first I/O —
read()returnsBinaryIO,write()acceptsbytes | BinaryIO - Atomic writes —
write_atomic()via temp-file-and-rename - Empty path support —
""resolves to store root for folder/query operations (see ADR-0004) - Full type safety — mypy strict mode,
py.typedmarker - Spec-driven development — 7 specifications, 4 ADRs, full test traceability with
@pytest.mark.spec - Examples — 6 runnable Python scripts and 3 Jupyter notebooks
- CI — ruff, mypy, pytest (Python 3.10–3.13), example validation
Known Limitations¶
- Only the local filesystem backend is implemented. S3, Azure, and SFTP backends are planned.
- No glob/pattern matching support yet (
Capability.GLOBis declared but unused). - No async API (sync-only by design; compatible with structured concurrency).