Features — remote-store v0.27.0¶
Authoritative snapshot of what remote-store delivers in this version. Updated each release. This is the single reference for the package's feature surface.
What this package does¶
remote-store gives application code a single, portable file-storage API.
Write against Store once; swap the backend by changing one line of config —
the same read(), write(), and list_files() calls work across all supported
backends (see the Backends section).
Three primitives do all the work:
Storewraps a backend and exposes capability-gated methods. An operation is available only if the backend supports it. Capability is queried at runtime withstore.supports(Capability.X), so code can adapt gracefully instead of failing at import time.Backendis the storage-specific adapter. Built-in backends cover the common targets; a public ABC makes it straightforward to add new ones.Registryloads named backend and store definitions from config (TOML, YAML, dict, or Pydantic models), resolves${ENV_VAR}placeholders, and wraps credentials inSecretautomatically.
Store API¶
Methods are grouped by the capability that gates them.
All methods share one invariant: they raise typed errors from remote_store.errors
and never leak backend-native exceptions to the caller.
Ungated (always available)¶
| Method | Returns | Description |
|---|---|---|
exists(path) |
bool |
Whether a file exists at the path |
is_file(path) |
bool |
Whether the path resolves to a file (not a folder) |
is_folder(path) |
bool |
Whether the path resolves to a folder |
ping() |
None |
Health check — raises BackendUnavailable if unreachable |
close() |
None |
Release backend resources |
child(subpath) |
Store |
Scoped sub-store rooted at subpath |
unwrap(type_hint) |
T |
Extract the underlying backend by type |
resolve(key) |
ResolutionPlan |
Resolution plan for a key (type, resolved path, options) |
native_path(key) |
str |
Backend-native path string for a store key |
to_key(path) |
str |
Convert a native path back to a store key |
supports(capability) |
bool |
Query whether a capability is active |
READ¶
| Method | Returns | Description |
|---|---|---|
read(path) |
BinaryIO |
Open a binary stream for reading; lazy where the backend supports LAZY_READ |
read_bytes(path) |
bytes |
Read the entire file into memory |
read_text(path, *, encoding) |
str |
Read the entire file as a decoded string |
read_seekable(path) |
BinaryIO |
Seekable binary stream; spools to a temp file if the backend is non-seekable |
WRITE¶
All write methods accept an optional metadata= mapping. If the backend
declares Capability.USER_METADATA, the mapping is persisted alongside the
file; otherwise a non-empty metadata= raises CapabilityNotSupported.
All write methods return a WriteResult (see Data Models).
| Method | Returns | Description |
|---|---|---|
write(path, content, *, overwrite, metadata) |
WriteResult |
Create or overwrite a file from bytes or a binary stream |
write_text(path, text, *, encoding, overwrite, metadata) |
WriteResult |
Write a string as a file |
ATOMIC_WRITE¶
Atomic writes use a temp-and-rename strategy: no reader ever sees a partial file, and a crash mid-write leaves the previous version intact.
| Method | Returns | Description |
|---|---|---|
write_atomic(path, content, *, overwrite, metadata) |
WriteResult |
Atomic write from bytes or a stream |
open_atomic(path, *, overwrite, metadata) |
context manager → WriteResult |
Streaming atomic write; WriteResult returned on __exit__ |
DELETE¶
| Method | Returns | Description |
|---|---|---|
delete(path, *, missing_ok) |
None |
Remove a file; missing_ok=True suppresses NotFound |
delete_folder(path, *, recursive) |
None |
Remove a folder; requires recursive=True if non-empty |
LIST¶
| Method | Returns | Description |
|---|---|---|
list_files(path, *, max_depth, pattern) |
Iterator[str] |
Enumerate file paths under a prefix; optional pattern= filters basenames via fnmatch |
list_folders(path, *, max_depth, pattern) |
Iterator[str] |
Enumerate immediate subfolder paths; optional pattern= filters folder basenames via fnmatch |
iter_children(path) |
Iterator[FileInfo \| FolderInfo] |
Iterate files and folders together, with metadata |
GLOB¶
| Method | Returns | Description |
|---|---|---|
glob(pattern) |
Iterator[str] |
Native glob pattern matching (*, **, ?) on file paths |
Backends without native GLOB capability can use the ext.glob extension as a
portable fallback.
MOVE / COPY¶
| Method | Returns | Description |
|---|---|---|
move(src, dst, *, overwrite) |
None |
Rename or relocate within the same backend |
copy(src, dst, *, overwrite) |
None |
Duplicate a file within the same backend |
METADATA¶
| Method | Returns | Description |
|---|---|---|
head(path) |
WriteResult |
File metadata without reading content (size, etag, last_modified, etc.) |
get_file_info(path) |
FileInfo |
Full file metadata record |
get_folder_info(path) |
FolderInfo |
Folder metadata with aggregate file count and total size |
Data Models¶
All models are importable from remote_store and are frozen dataclasses.
WriteResult¶
Returned by every write method and by head().
| Field | Type | Populated when |
|---|---|---|
path |
str |
Always |
size |
int |
Always |
source |
WriteSource |
Always (NativeSource, BasicSource, or SidecarSource) |
digest |
ContentDigest \| None |
Backend declares WRITE_RESULT_NATIVE |
etag |
str \| None |
Backend declares WRITE_RESULT_NATIVE |
version_id |
str \| None |
Backend declares WRITE_RESULT_NATIVE (S3 versioning, Azure) |
last_modified |
datetime \| None |
Backend declares WRITE_RESULT_NATIVE |
metadata |
dict[str, str] |
Echo of caller's metadata= input |
source signals where the rich fields came from:
NativeSource = populated from the backend's write response;
BasicSource = populated from a post-write stat/head;
SidecarSource = populated by the ext.write hash helper.
FileInfo¶
Returned by get_file_info() and yielded by iter_children().
| Field | Type |
|---|---|
path |
str |
size |
int |
modified_at |
datetime \| None |
etag |
str \| None |
version_id |
str \| None |
content_type |
str \| None |
metadata |
dict[str, str] |
Other models¶
| Model | Description |
|---|---|
FolderInfo |
path, file_count, total_size — returned by get_folder_info() |
ContentDigest |
algorithm (e.g. "crc32", "sha256") + value (hex) |
ResolutionPlan |
Backend type, resolved path, and options — returned by resolve() |
BackendConfig |
type string + options dict — one entry in a RegistryConfig |
RegistryConfig |
Named backends and stores; the entry point for config-driven setup |
Capabilities¶
Capabilities are declared by backends at construction time. Two flavours exist:
- Method gates: the capability is a hard prerequisite for calling the
method.
store.supports()returnsFalseand the method raisesCapabilityNotSupportedif called without it. - Quality flags: the capability signals that a method delivers a stronger guarantee. The method is available regardless; the flag lets callers decide whether to rely on the native behaviour or fall back to an extension.
| Capability | Flavour | Gated / signalled methods | Notes |
|---|---|---|---|
READ |
Gate | read(), read_bytes(), read_text(), read_seekable() |
Declared by all built-in backends |
WRITE |
Gate | write(), write_text() |
Declared by all built-in backends |
DELETE |
Gate | delete(), delete_folder() |
Declared by all built-in backends |
LIST |
Gate | list_files(), list_folders(), iter_children() |
Declared by all built-in backends |
GLOB |
Gate | glob() |
Not declared by memory; use ext.glob as fallback |
MOVE |
Gate | move() |
Declared by all built-in backends |
COPY |
Gate | copy() |
Declared by all built-in backends |
ATOMIC_WRITE |
Gate | write_atomic(), open_atomic() |
Declared by all built-in backends |
METADATA |
Gate | head(), get_file_info(), get_folder_info() |
Declared by all built-in backends |
SEEKABLE_READ |
Quality flag | read() |
read() returns a natively seekable stream |
LAZY_READ |
Quality flag | read() |
read() fetches data lazily; partial reads avoid loading the full file |
ATOMIC_MOVE |
Quality flag | move() |
move() is crash-safe under concurrent access |
WRITE_RESULT_NATIVE |
Quality flag | write*(), head() |
Rich WriteResult fields (etag, digest, version_id, last_modified) come from the backend's own write response |
USER_METADATA |
Strict gate | metadata= kwarg on write methods |
A non-empty metadata= raises CapabilityNotSupported when unset |
Backends¶
Built-in backends cover the most common targets. All implement the same
Backend ABC; SFTPUtils, S3PyArrowBackend, and the SQL backends expose
additional backend-specific options via BackendConfig.options.
| Type | Class | Extra | Capabilities |
|---|---|---|---|
azure |
AzureBackend |
remote-store[azure] |
All except ATOMIC_MOVE, SEEKABLE_READ |
http |
ReadOnlyHttpBackend |
— (stdlib; requests/httpx optional) |
LAZY_READ, METADATA, READ |
local |
LocalBackend |
— | All |
memory |
MemoryBackend |
— | All except GLOB, LAZY_READ |
s3 |
S3Backend |
remote-store[s3] |
All except ATOMIC_MOVE |
s3-pyarrow |
S3PyArrowBackend |
remote-store[s3-pyarrow] |
All except ATOMIC_MOVE |
sftp |
SFTPBackend |
remote-store[sftp] |
All except ATOMIC_MOVE, GLOB |
sql-blob |
SQLBlobBackend |
remote-store[sql] |
All except LAZY_READ |
sql-query |
SQLQueryBackend |
remote-store[sql-query] |
GLOB, LIST, METADATA, READ, SEEKABLE_READ |
Write-result quality flags by backend:
| Backend | WRITE_RESULT_NATIVE |
USER_METADATA |
|---|---|---|
azure |
Yes | Yes |
http |
— | — |
local |
Yes | — |
memory |
Yes | Yes |
s3 |
Yes | Yes |
s3-pyarrow |
Yes | — |
sftp |
Yes | — |
sql-blob |
Yes (requires modified_at column) |
Yes (requires user_metadata column) |
sql-query |
— | — |
Configuration¶
RegistryConfig decouples storage topology from application code.
Define named backends and stores in a config file; application code
calls registry.store("name") and never sees connection strings.
Credentials in standard-looking keys (password, secret_key, etc.)
are wrapped in Secret automatically and masked in logs.
[backends.primary]
type = "s3"
options.bucket = "my-bucket"
options.access_key = "${AWS_ACCESS_KEY_ID}"
options.secret_key = "${AWS_SECRET_ACCESS_KEY}"
[stores.data]
backend = "primary"
root = "data/"
| API | Location | Description |
|---|---|---|
RegistryConfig.from_dict(data) |
remote_store |
Construct from a plain dict |
RegistryConfig.from_toml(path, *, table, resolve_env_vars) |
remote_store |
Load from a TOML file |
from_yaml(path, *, resolve_env_vars) |
remote_store.ext.yaml |
Load from a YAML file |
from_pydantic(model) |
remote_store.ext.pydantic |
Convert a Pydantic settings model |
resolve_env(data, *, environ) |
remote_store |
Resolve ${VAR} / ${VAR:-default} placeholders in a config dict |
Extensions¶
Extensions are composable layers on top of Store or utility helpers.
They use only the public Store/Backend API and carry no lifecycle
ownership — they never call store.close().
Always available (base install)¶
| Extension | Problem it solves | Module | Key exports |
|---|---|---|---|
| Batch | Apply copy/delete/exists across many paths in one call with partial-failure reporting | remote_store.ext.batch |
BatchResult, batch_copy, batch_delete, batch_exists |
| Cache | Transparent read-through caching — subsequent reads of the same key go to a fast local store | remote_store.ext.cache |
CachedStore, CacheBackend, CacheStats, MemoryCache, cache |
| Glob | Portable glob for backends that do not declare Capability.GLOB |
remote_store.ext.glob |
glob_files |
| Integrity | Client-side content checksums and verification independent of backend digest support | remote_store.ext.integrity |
checksum, verify, verify_hex, content_digest |
| Observe | Structured event hooks for every store operation — logging, metrics, tracing, audit trails | remote_store.ext.observe |
ObservedStore, StoreEvent, BufferedObserver, observe, set_correlation_id |
| Partition | Parse and construct Hive-style partition paths (key=value/…) |
remote_store.ext.partition |
ParsedPartition, parse_partition, partition_path |
| Streams | Wrap any stream with progress callbacks or rolling checksums without buffering | remote_store.ext.streams |
ChecksumReader, ChecksumWriter, ProgressReader, ProgressWriter, read_with_progress |
| Transfer | High-level upload / download / store-to-store copy with streaming and progress | remote_store.ext.transfer |
upload, download, transfer |
| Write | Guaranteed client-side digest on write regardless of backend capability; atomic write variant | remote_store.ext.write |
write_with_hash, open_atomic_with_hash |
Optional (require extras)¶
| Extension | Problem it solves | Module | Key exports | Extra |
|---|---|---|---|---|
| Arrow | Use any Store as a PyArrow filesystem for Arrow / Parquet tooling |
remote_store.ext.arrow |
StoreFileSystemHandler, pyarrow_fs |
remote-store[arrow] |
| Parquet | Read and write Parquet datasets (single files or partitioned) via PyArrow | remote_store.ext.parquet |
ParquetDatasetStore, DatasetManifest |
remote-store[arrow] |
| OpenTelemetry | Emit distributed tracing spans for every store operation | remote_store.ext.otel |
otel_hooks, otel_observe |
remote-store[otel] |
| Pydantic | Derive RegistryConfig from a Pydantic settings model |
remote_store.ext.pydantic |
from_pydantic |
remote-store[pydantic] |
| YAML | Load RegistryConfig from a YAML file |
remote_store.ext.yaml |
from_yaml |
remote-store[yaml] |
| Dagster | IO manager, config-driven Store resource, and compute log manager for Dagster pipelines | remote_store.ext.dagster |
RemoteStoreIOManager, DagsterStoreResource, RemoteStoreComputeLogManager, dagster_io_manager |
remote-store[dagster] |
Error Model¶
All errors are subclasses of RemoteStoreError (importable from remote_store).
Backend-native exceptions are mapped at the adapter boundary — callers
always receive a typed error, never an S3ServiceError or azure.core.….
| Error | Raised when |
|---|---|
NotFound |
File or folder does not exist |
AlreadyExists |
Target path already exists and overwrite=False |
InvalidPath |
Path is malformed or points at the wrong node type (e.g. a directory where a file is expected) |
CapabilityNotSupported |
A method is called without the required capability |
BackendUnavailable |
The backend is unreachable (network, auth, service down) |
PermissionDenied |
Caller lacks access rights |
DirectoryNotEmpty |
Directory is not empty and the operation requires it to be |
RemoteStoreError |
Base class for all errors above |
Async API¶
remote_store.aio provides native async/await support and two bridge
adapters for mixing sync and async backends.
Native async classes:
| Class | Description |
|---|---|
AsyncStore |
Async counterpart to Store; coroutine methods for all operations; write*() returns WriteResult and accepts metadata= |
AsyncBackend |
ABC for native async backends |
AsyncMemoryBackend |
In-memory async backend (for testing) |
AsyncAzureBackend |
Native async Azure backend via Azure SDK async clients |
AsyncWritableContent |
Type alias: bytes \| AsyncIterator[bytes] |
Bridge adapters — when you need to cross the sync/async boundary:
| Class | Direction | Mechanism |
|---|---|---|
SyncBackendAdapter |
sync Backend → usable in async code |
Dispatches each call via asyncio.to_thread (thread-pool executor) |
AsyncBackendSyncAdapter |
async AsyncBackend → usable in sync code |
Runs the async backend on a dedicated daemon-thread event loop |
Both adapters translate capabilities faithfully and are covered by the same conformance suite as native backends.
Async extensions:
| Module | Description |
|---|---|
aio.ext.write |
write_with_hash — client-side SHA-256 checksumming on async write paths |
Install extras¶
pip install remote-store[arrow] # PyArrow filesystem bridge + Parquet extension
pip install remote-store[azure] # Azure ADLS Gen2 via Azure SDK
pip install remote-store[dagster] # Dagster IO manager
pip install remote-store[httpx] # httpx HTTP adapter for ReadOnlyHttpBackend
pip install remote-store[otel] # OpenTelemetry distributed tracing
pip install remote-store[pydantic] # Pydantic settings integration
pip install remote-store[requests] # requests HTTP adapter for ReadOnlyHttpBackend
pip install remote-store[s3] # S3 via s3fs
pip install remote-store[s3-pyarrow] # S3 via PyArrow C++ filesystem
pip install remote-store[sftp] # SFTP via paramiko
pip install remote-store[sql] # SQL blob store via SQLAlchemy
pip install remote-store[sql-query] # SQL query store via SQLAlchemy + PyArrow
pip install remote-store[toml] # TOML config (stdlib on Python 3.11+)
pip install remote-store[yaml] # YAML config loading
Each extra declares a floor in pyproject.toml and in most cases
deliberately no ceiling (the arrow and sql-query extras carry a
pyarrow<25 ceiling). For the exact upper-bound versions CI was last
green against, see Tested upper-bound versions.