Research: Backend Health Check (`store.ping()` / `backend.check_health()`)¶

Item ID: ID-054 Date: 2026-03-09 Context: Lightweight, non-destructive health verification for backends; startup gates and liveness probes.

1. Overview and Motivation¶

A health check method verifies that a backend is reachable and credentials are valid without performing any data operations. This enables:

Startup gates: Fail fast if credentials are invalid before accepting traffic
Liveness probes: Kubernetes, container orchestrators, monitoring systems
Connection validation: Application initialization / bootstrap logic
Operational hygiene: Verify before critical operations

The operation must be non-destructive (no side effects), lightweight (minimal I/O), and portable across all backends.

2. Design Constraints and Principles¶

2.1 Non-Destructive¶

Must not create, modify, or delete any data
Must not have observable side effects on the backend
Read-only or pure metadata operations only

2.2 Lightweight¶

Should complete in under 1–2 seconds on a healthy backend
Minimal network round-trips (preferably a single call)
No listing, streaming, or enumeration

2.3 Portable Semantics¶

Uniform API across all backends (Store + all 6 backends)
Failure modes standardized: success (healthy), exception (unhealthy)
Credential validation implicit (failures = bad credentials or connectivity)

2.4 Clear Failure Semantics¶

Success = backend is reachable and credentials work
Raises exception on any connectivity or credential issue
Does not return a bool or status enum — follows remote-store convention of raising exceptions for error conditions
Note: LocalBackend.__init__ calls self._root.mkdir(parents=True, exist_ok=True), so the root always exists after construction. A NotFound for Local would only occur if the directory is deleted between construction and ping() — an unusual but valid edge case to handle

3. Backend-Specific Strategies¶

3.1 Local Backend¶

Method: os.access(root_path, os.R_OK)

Verifies the root path exists and is readable
Non-destructive, instant
Raises PermissionDenied if path not readable, NotFound if path missing

Implementation Notes: - Works for any root path (file or folder) - Natural fit for how Local backend validates paths

3.2 S3 Backend¶

Method: head_bucket() via s3fs' underlying botocore client

Lightweight metadata call, no data transfer via s3fs
Validates bucket exists and credentials have permission
s3fs wraps boto3/botocore; access underlying client via self._fs.s3.head_bucket()
Or use self._fs.info(self._bucket) for direct stat-like metadata

Implementation Notes: - S3Backend uses self._fs (s3fs.S3FileSystem), not a raw boto3 client - Access underlying client: self._fs.s3.head_bucket(Bucket=self._bucket) - Or lightweight info call: self._fs.info(self._bucket) returns metadata - Preferred over checking for root path existence (which would require listing)

Error Mapping: - 403 / AccessDenied → PermissionDenied - 404 / NoSuchBucket → NotFound - Timeout / connection → BackendUnavailable

3.3 S3-PyArrow Backend¶

Method: PyArrow S3FileSystem.get_file_info() on bucket root

Uses PyArrow's native S3FileSystem, not boto3
self._pa_fs.get_file_info(self._bucket) returns lightweight metadata
Validates bucket exists and credentials work

Implementation Notes: - S3-PyArrow wraps PyArrow's S3FileSystem, not boto3 - Lightweight metadata call via get_file_info() on bucket name or root path - Error handling consistent with S3 Backend (map to same exception types)

3.4 SFTP Backend¶

Method: stat(root_path) or equivalent

Checks if root path exists via os.stat()-like call
Paramiko provides sftp.stat(path) — returns stat info
Also validates SSH connectivity and credentials

Implementation Notes: - sftp.stat(root_path) already used in existence checks - Non-destructive: just metadata lookup - Error mapping: EIO / connection → BackendUnavailable, EACCES → PermissionDenied, ENOENT → NotFound

3.5 Azure Blob Storage / Data Lake Storage¶

Method: HNS-aware container/filesystem properties lookup

Non-HNS (standard Blob): - get_container_properties() or equivalent (ContainerClient.exists()) - Non-destructive metadata call - Validates container exists and credentials work

HNS (Data Lake): - DataLakeFileSystemClient.get_file_system_properties() or equivalent (exists()) - Similar lightweight metadata call - Validates filesystem exists and credentials work

Implementation Notes: - AzureBackend detects HNS mode via self._hns flag and branches accordingly - Non-HNS: ContainerClient.get_container_properties() is lightweight - HNS: DataLakeFileSystemClient.get_file_system_properties() (reuses existing folder stat logic) - Error mapping (both modes): 403 → PermissionDenied, 404 → NotFound, connection → BackendUnavailable

3.6 Memory Backend¶

Method: Always succeeds

In-memory backend is always "healthy" by definition
Return immediately without any checks
Could optionally verify that root path exists in the tree, but not required

Implementation Notes: - Safe to always return success — no external resources to validate

4. API Design¶

Option A: `store.ping()` (Recommended)¶

# Store level
def ping(self) -> None:
    """
    Verify the backend is reachable and credentials are valid.

    Non-destructive: performs no data operations, creates no side effects.
    Lightweight: single metadata call per backend.

    Raises
    ------
    PermissionDenied
        If credentials are invalid or insufficient.
    NotFound
        If the backend root path / container / filesystem does not exist.
    BackendUnavailable
        If the backend is unreachable (network, timeout, etc.).

    Examples
    --------
    >>> store = Store(s3_backend, root_path="data")
    >>> store.ping()  # Raises on error, succeeds silently

    >>> try:
    ...     store.ping()
    ... except (PermissionDenied, NotFound, BackendUnavailable) as e:
    ...     print(f"Backend unhealthy: {e}")
    """

Rationale: - Familiar terminology from web/API health checks - Short, memorable method name - Consistent with Unix/network convention

Option B: `backend.check_health()` (Alternative)¶

# Backend ABC level
def check_health(self) -> None:
    """
    Verify the backend is healthy (reachable, credentials valid).
    """

Tradeoff: - More explicit about intent - Longer name - Less standard in Python ecosystem

Design Decision: Both¶

Expose at Store level as ping() (user-facing)
Implement at Backend ABC level as check_health() (backend contract)
Store delegates to backend

5. Error Mapping¶

Health checks should raise existing error types from the error model (005-error-model.md):

Condition	Exception Type	Details
Credentials invalid	`PermissionDenied`	Backend explicitly rejects credentials
Root path missing	`NotFound`	Bucket, container, FS, or directory does not exist
Network unreachable	`BackendUnavailable`	Timeout, connection refused, DNS failure
Other backend error	`BackendUnavailable`	Generic fallback for unmapped errors

No new error types needed — existing error model covers all scenarios.

6. Integration Points¶

6.1 Store Lifecycle¶

Optional in __init__() or shortly after for fail-fast bootstrap
Separate call — not automatic (user controls when to verify)
No dependency on Registry — Store can call ping() independently

6.2 Registry Health¶

Could add Registry.ping() to verify all registered backends
Iterates all backends, calls store.ping() on each
Returns first error or succeeds silently

6.3 Observability Integration¶

ext.observe hooks can wrap health checks for monitoring
OpenTelemetry span for ping() call
Metrics: health check latency, failure rate
Logging: debug-level entry/exit, info-level on failure

6.4 No Tight Coupling¶

Health check is independent of existing Store/Backend methods
Does not affect caching (ext.cache)
Does not interact with batch operations
Does not require capabilities (not capability-gated)

7. Testing Strategy¶

7.1 Unit Tests per Backend¶

For each backend, test: - Success case: Healthy backend → no exception - PermissionDenied: Invalid credentials → raises PermissionDenied - NotFound: Missing bucket/path → raises NotFound - BackendUnavailable: Mock network failure → raises BackendUnavailable

7.2 Conformance Tests¶

In test_conformance.py, add: - test_check_health_success() — all backends pass when healthy - test_check_health_missing_root() — NotFound when root missing (skip Memory) - Error injection tests per backend via mock

7.3 Integration Tests¶

DockerBackend fixtures: real MinIO, Azurite, SFTP
Verify actual latency / performance (should be < 1 second)

7.4 Examples¶

Simple script: examples/health_check.py or section in existing example
Shows Store + health check pattern for startup validation

8. Specification Outline (for ADR / Spec)¶

Spec Structure¶

Spec: sdd/specs/025-health-check.md (tentative number)

Sections:

Overview — use cases and design constraints
Store API — store.ping() signature and semantics
Backend ABC — Backend.check_health() contract
Per-backend implementation — strategies and error mapping
Error Handling — detailed error types and conditions
Integration — observability, Registry, lifecycle
Non-requirements — what health checks do NOT do
Testing — unit, conformance, integration, examples

Traceability Markers: - PING-001 through PING-NNN for spec requirements - Store method: STORE-016 (verify at spec authoring; historical note: STORE-015 was once duplicated in spec 001-store-api.md across native_path() and glob() — resolved under BK-250, glob() is now STORE-018) - Backend method: BE-026 (next available; BE-025 is native_path())

9. Implementation Roadmap¶

Phase 1: Core¶

Add Backend.check_health() ABC method (all 6 backends implement)
Implement per-backend logic (Local, S3, S3-PyArrow, SFTP, Azure)
Add Store.ping() delegation
Unit tests per backend + conformance suite
Spec + ADR

Phase 2: Observability¶

ext.observe wiring (hooks for health checks)
OpenTelemetry span support
Example: examples/health_check.py

Phase 3: Registry Integration¶

Registry.ping_all() or similar (optional convenience method)
Docs integration

10. Known Considerations and Open Questions¶

10.1 Should health checks be capability-gated?¶

Decision: No.

Rationale: - Health checks are orthogonal to data operations - All backends can provide some form of health verification - Failure should never be due to missing capability, only backend unavailability

10.2 Should health checks cache results?¶

Decision: No, always make a live call.

Rationale: - Health checks are meant to detect transient failures - Caching would defeat the purpose - Caller can implement their own caching if needed - ext.cache applies to data operations, not health checks

10.3 What about timeout configuration?¶

Decision: Deferred; use backend's default timeout.

Rationale: - Adding timeout parameters complicates the API - Backends already have timeout configuration - Can revisit if customers need tunable timeouts

10.4 Should health checks verify read or write capability?¶

Decision: Read only (minimal verification).

Rationale: - Write verification would require creating/deleting test files (side effects) - Read (existence, permissions) is sufficient for startup gates and liveness probes - Users can test write capability separately if needed

10.5 Return type: void (None) vs. bool vs. object?¶

Decision: None (void).

Rationale: - Consistent with remote-store error convention: success = return silently, failure = raise - Boolean return would require catch-all exception for unhealthy state - None is idiomatic Python for "operation succeeded"

11. Comparison with Other Libraries¶

AWS SDK (`boto3`)¶

s3_client.head_bucket() — validates bucket access
Returns response metadata; no exception = success
remote-store aligns with this pattern

Google Cloud (`google-cloud-storage`)¶

bucket.exists() — checks bucket existence
Similar lightweight metadata call

Azure SDK¶

ContainerClient.exists() / get_container_properties()
Both lightweight, non-destructive

fsspec (via s3fs, adlfs, paramiko)¶

No standard health check interface
Each provider has its own validation mechanism
remote-store standardizing across backends is an improvement

12. Summary and Recommendations¶

Method names: Store.ping() (user-facing), Backend.check_health() (implementation)
Error types: Use existing PermissionDenied, NotFound, BackendUnavailable
Implementation: Per-backend lightweight checks (HeadBucket, stat, exists calls)
No new capabilities: Health checks are always available
Non-destructive: Verify without side effects
Testing: Unit + conformance + integration against real backends
Spec: Separate spec document (025-health-check.md)
Integration: Optional observability hooks via ext.observe
No caching, no timeouts, no return value — keep API minimal and idiomatic

Appendix: Pseudocode¶

Store.ping()¶

def ping(self) -> None:
    """Verify backend is reachable and credentials work."""
    return self._backend.check_health()

Backend.check_health() (ABC)¶

@abstractmethod
def check_health(self) -> None:
    """
    Verify the backend is healthy (reachable, credentials valid).

    Raises
    ------
    PermissionDenied
        Credentials invalid or insufficient.
    NotFound
        Root path / container / filesystem does not exist.
    BackendUnavailable
        Backend unreachable (network, timeout, etc.).
    """

LocalBackend.check_health()¶

def check_health(self) -> None:
    """Verify root path exists and is readable."""
    try:
        if not self._root.exists():
            raise NotFound(f"Root path does not exist: {self._root}")
        if not os.access(self._root, os.R_OK):
            raise PermissionDenied(f"No read access to {self._root}")
    except OSError as e:
        # Refine based on errno or re-raise as BackendUnavailable

S3Backend.check_health()¶

def check_health(self) -> None:
    """Verify bucket exists and credentials work via HeadBucket."""
    with self._errors():
        # Option A: underlying botocore client
        self._fs.s3.head_bucket(Bucket=self._bucket)
        # Option B: s3fs info call
        # self._fs.info(self._bucket)

S3PyArrowBackend.check_health()¶

def check_health(self) -> None:
    """Verify bucket exists and credentials work via PyArrow metadata."""
    with self._errors():
        self._pa_fs.get_file_info(self._bucket)

Author notes: This research provides a complete blueprint for implementing health checks. The design is minimal, portable, and consistent with the existing architecture. No blocking issues identified.

Research: Backend Health Check (store.ping() / backend.check_health())¶