Research: Read-Only HTTP Backend¶

Item ID: ID-082 Date: 2026-03-15 Status: Research complete — ready for spec consideration

1. Problem Statement¶

Users already use remote-store for storage (local, S3, SFTP, Azure). Sometimes another kind of "remote stored thing" enters the picture: files hosted at an HTTP URL — government open data portals, dataset registries, static file servers, CDN-hosted assets, package archives, etc.

A ReadOnlyHttpBackend would treat an HTTP endpoint as just another backend. Files behind a URL become accessible through the same Store interface, with the same composability (ext.cache, ext.transfer, ext.observe, ext.batch) that users already rely on for other backends.

Why a backend, not an extension?¶

An extension cannot provide Store.read() — it would need to reimplement the entire Store interface. A backend slots into the existing architecture naturally: capability gating, error mapping, registry lifecycle, and all extensions work out of the box.

Design constraints¶

Core package has zero runtime dependencies (dependencies = []).
HTTP library must be optional (urllib from stdlib as baseline, requests or httpx as optional extras).
The backend is read-only — write, delete, move, copy operations raise CapabilityNotSupported.
Must handle real-world HTTP concerns: redirects, content-type, timeouts, auth headers.

2. Capability Profile¶

Capability	Supported	Notes
READ	Yes	Core value: `GET` request, return body as stream
WRITE	—	Raises `CapabilityNotSupported`
DELETE	—	Raises `CapabilityNotSupported`
LIST	—	No reliable server-side mechanism (see SS5)
MOVE	—	Raises `CapabilityNotSupported`
COPY	—	Raises `CapabilityNotSupported`
ATOMIC_WRITE	—	Raises `CapabilityNotSupported`
METADATA	Yes	`HEAD` request -> size, content-type, last-modified, ETag
GLOB	—	No server-side pattern matching

Capability set: {READ, METADATA}

This would be the first backend with only 2 capabilities. The capability system already handles this — Store gates every operation and raises CapabilityNotSupported with clear context.

3. Path Semantics¶

3.1 Base URL + relative path¶

The backend takes a base_url at construction. Paths are appended:

backend = ReadOnlyHttpBackend(base_url="https://data.example.com/datasets/")
# store.read("population/2024.csv")
# -> GET https://data.example.com/datasets/population/2024.csv

3.2 The `urljoin` trailing-slash footgun¶

urllib.parse.urljoin has surprising behavior with trailing slashes:

urljoin("https://example.com/data", "file.csv")
# -> "https://example.com/file.csv"  (WRONG -- replaces last segment)

urljoin("https://example.com/data/", "file.csv")
# -> "https://example.com/data/file.csv"  (correct)

Mitigation: The constructor normalizes base_url to always end with /. Path construction uses simple string concatenation (base_url + quote(path)) rather than urljoin, avoiding the footgun entirely. urljoin is only needed if we ever support relative ../ paths, which we don't (path validation rejects ..).

3.3 Path validation¶

Standard remote-store path rules apply (no .., no null bytes, no absolute paths).
The backend URL-encodes paths internally via urllib.parse.quote(path, safe="/") when constructing request URLs. User-visible paths remain unencoded.

3.4 `native_path()` and `to_key()`¶

native_path(path) -> full URL string (e.g., "https://data.example.com/datasets/population/2024.csv")
to_key(native_path) -> strips base_url prefix, returns relative key

4. HTTP Library Strategy¶

4.1 Tiered approach¶

Tier	Library	Dependency	Pros	Cons
Baseline	`urllib.request`	stdlib	Zero deps, always available	No connection pooling, clunky API, no async
Standard	`requests`	optional extra	Industry standard, sessions, auth adapters	Sync only, heavy dep tree
Advanced	`httpx`	optional extra	Sync + async, HTTP/2, modern API	Newer, smaller ecosystem

4.2 Recommendation¶

Single backend, pluggable transport. Rather than separate backends per HTTP library (the S3 vs S3-PyArrow model), use a single ReadOnlyHttpBackend that auto-detects the best available library at init:

If httpx is installed -> use it (best feature set)
Else if requests is installed -> use it (most common)
Else -> fall back to urllib.request (always available)

User can override: ReadOnlyHttpBackend(base_url=..., http_client="urllib").

Rationale: Unlike S3 vs S3-PyArrow (which have fundamentally different I/O models and performance profiles), the HTTP libraries are functionally interchangeable for our needs. One backend with swappable transport is simpler than three backends.

4.3 Transport abstraction¶

Internal protocol (not user-facing):

class HttpTransport(Protocol):
    def get(self, url: str, headers: dict[str, str], timeout: float) -> HttpResponse: ...
    def head(self, url: str, headers: dict[str, str], timeout: float) -> HttpResponse: ...
    def close(self) -> None: ...

@dataclass
class HttpResponse:
    status: int
    headers: dict[str, str]
    body: BinaryIO  # streaming body (only meaningful for GET)

Three implementations: UrllibTransport, RequestsTransport, HttpxTransport.

4.4 urllib limitations (verified)¶

No connection pooling. Each request opens a new TCP connection. Fine for occasional reads; poor for batch operations. requests/httpx sessions solve this.
No async. Acceptable — all existing backends are sync.
SSL works. urllib.request.urlopen validates TLS certificates by default via ssl.create_default_context(). verify_ssl=False would use ssl._create_unverified_context().
Redirects handled. urllib follows redirects automatically (up to a built-in limit). Custom max_redirects requires a subclassed handler.
Streaming works. Response object supports chunked read(size). See §11 for details.

5. LIST Capability — Why Not¶

HTTP has no native directory listing. The options considered:

Approach	Verdict	Reason
No LIST	Chosen	Clean, honest, no hacks
Manifest-based LIST	Deferred	Requires user to maintain sidecar file; could add later via `manifest_path` param
HTML index parsing	Rejected	Fragile (HTML varies by server), security risk (arbitrary HTML)
API-specific listing	Out of scope	Belongs in focused extensions (e.g., `ext.ckan`)

Users who need listing use an external catalog (API, manifest, database) to discover paths, then store.read() each one. If demand justifies it, a manifest_path constructor argument could upgrade the capability set to {READ, METADATA, LIST} in a future phase.

6. Complete Method Mapping¶

Every Backend ABC method and its HTTP implementation:

Method	Implementation	Notes
`name`	`"http"`	See §18 Q1 for naming rationale
`capabilities`	`{READ, METADATA}`	Fixed set
`exists(path)`	`HEAD` -> 200=True, 404=False
`is_file(path)`	`HEAD` -> 200=True, 404=False	HTTP resources are always "files"
`is_folder(path)`	Always `False`	No folder concept without LIST
`read(path)`	`GET` -> `_ErrorMappingStream(response)`	Non-seekable stream, see §11
`read_bytes(path)`	`GET` -> `response.read()`	Fully buffered
`write(...)`	Raise `CapabilityNotSupported`
`write_atomic(...)`	Raise `CapabilityNotSupported`
`open_atomic(...)`	Raise `CapabilityNotSupported`
`delete(...)`	Raise `CapabilityNotSupported`
`delete_folder(...)`	Raise `CapabilityNotSupported`
`list_files(...)`	Raise `CapabilityNotSupported`
`list_folders(...)`	Raise `CapabilityNotSupported`
`iter_children(...)`	Raise `CapabilityNotSupported`	Default impl calls list_files+list_folders; override to raise directly
`get_file_info(path)`	`HEAD` -> `FileInfo(...)`	See §12 for field mapping
`get_folder_info(path)`	Raise `NotFound`	No folder concept; consistent with `is_folder()` returning `False`
`move(...)`	Raise `CapabilityNotSupported`
`copy(...)`	Raise `CapabilityNotSupported`
`glob(...)`	Raise `CapabilityNotSupported`	Default impl already does this
`check_health()`	`HEAD base_url` -> raise `BackendUnavailable` on failure
`native_path(path)`	Return full URL string
`to_key(url)`	Strip `base_url` prefix
`close()`	Close transport (connection pool if applicable)	No-op for urllib
`unwrap(type_hint)`	Return underlying transport if type matches	e.g., `unwrap(httpx.Client)`

7. Error Mapping¶

HTTP Status	remote-store Error	Notes
200, 204	Success
301, 302, 307, 308	Follow redirect (up to limit)	Map final status
401, 403	`PermissionDenied`
404	`NotFound`
408, 429, 500, 502, 503, 504	`BackendUnavailable`	Transient
Other 4xx	`RemoteStoreError`	Generic

8. Configuration & Auth¶

8.1 Constructor signature (sketch)¶

ReadOnlyHttpBackend(
    base_url: str,
    *,
    headers: dict[str, str] | None = None,   # custom headers (API keys, auth tokens)
    timeout: float = 30.0,                     # request timeout in seconds
    retry: RetryPolicy | None = None,          # retry config (same as S3/SFTP/Azure)
    http_client: str | None = None,            # force "urllib", "requests", or "httpx"
    verify_ssl: bool = True,                   # TLS verification
    max_redirects: int = 5,                    # redirect follow limit
)

8.2 Auth patterns¶

API key in header: headers={"Authorization": "Bearer <token>"} or headers={"X-API-Key": "<key>"}
No auth: Most open data portals need nothing
Advanced auth (OAuth, mutual TLS): Out of scope for v1. Users can pre-configure an httpx.Client and pass it via a future client parameter.

8.3 Registry integration¶

# store config
stores:
  opendata:
    backend: http
    base_url: "https://data.example.com/datasets/"
    options:
      timeout: 60
      headers:
        X-API-Key: "${OPENDATA_API_KEY}"

9. Composability with Existing Extensions¶

This is the primary value of making it a backend vs. standalone code:

Extension	Benefit
`ext.cache`	TTL-based caching of `read()` results — critical for HTTP, avoids repeated downloads
`ext.transfer`	`download(store, "dataset.csv", local_path)` — works out of the box
`ext.observe`	Instrument HTTP reads with callbacks (timing, logging)
`ext.batch`	`batch_exists(store, paths)` — check multiple resources
`ext.arrow`	`read_table(store, "data.parquet")` — read remote Parquet/CSV via PyArrow

The ext.cache composability alone justifies the backend approach over ad-hoc HTTP code.

Note: ext.glob requires LIST capability, so it won't work with this backend.

10. Conformance Suite Impact¶

10.1 Current state of capability-gating¶

The conformance suite (tests/backends/test_conformance.py) has 19 test classes with 69 test methods. Only two capabilities are currently gated:

Capability	Gated?	Tests
ATOMIC_WRITE	Yes	7 tests skip cleanly
GLOB	Yes	2 tests skip cleanly
WRITE, DELETE, LIST, MOVE, COPY, METADATA	No	~60 tests have no capability checks

10.2 What breaks for a {READ, METADATA} backend¶

Most test classes set up test data by calling backend.write() before asserting read behavior. This means even read/metadata tests will fail — not because the backend can't read, but because the test can't set up fixtures.

Tests that would need changes:

Test Class	Issue	Fix
`TestBackendExists`	Calls `write()` in setup	Gate on WRITE or use pre-seeded fixture
`TestBackendFileFolder`	Calls `write()` in setup	Gate on WRITE
`TestBackendRead`	Calls `write()` in setup	Gate on WRITE or pre-seed
`TestBackendWrite`	Tests write operations	Gate on WRITE
`TestBackendDelete`	Tests delete operations	Gate on DELETE
`TestBackendListing`	Tests list operations	Gate on LIST
`TestBackendIterChildren`	Tests list operations	Gate on LIST
`TestBackendMetadata`	Calls `write()` in setup	Gate on WRITE or pre-seed
`TestBackendMove`	Tests move operations	Gate on MOVE
`TestBackendCopy`	Tests copy operations	Gate on COPY
`TestStreamingConformance`	4 read tests (SIO-001) call `write()` for setup; 2 write tests (SIO-003) test write behavior	Read tests: gate on WRITE or pre-seed; write tests: gate on WRITE
`TestBackendToKey`	3 of 4 methods pass; `test_to_key_round_trip_with_listing` calls `write()` + `list_files()`	Gate 1 method on WRITE+LIST

Tests that pass as-is:

Test Class	Why
`TestBackendIdentity`	Only checks name, capabilities, repr
`TestBackendWriteAtomic`	Already gated on ATOMIC_WRITE
`TestBackendOpenAtomic`	Already gated on ATOMIC_WRITE
`TestBackendLifecycle`	Only checks that `close()` is callable
`TestBackendGlob`	Already gated on GLOB
`TestBackendUnwrap`	Only checks unwrap raises or returns
`TestBackendNativePath`	Only checks path round-trip

10.3 Proposed conformance changes¶

Two-pronged approach:

Add capability gates to test classes that test write/delete/move/copy operations. Pattern: if not backend.capabilities.supports(Capability.X): pytest.skip(...). This is the same pattern already used for ATOMIC_WRITE and GLOB. Straightforward, ~15 lines of changes.
Pre-seeded fixture for read-only backends. Tests that verify read behavior (TestBackendRead, TestBackendMetadata, TestBackendExists) need test data. For writable backends, they create it inline. For read-only backends, provide a conftest.py fixture that pre-seeds the HTTP mock server with test files. The conformance test checks backend.capabilities.supports(Capability.WRITE) — if true, write inline; if false, assume the fixture pre-seeded the data.

Estimated effort: Small. The capability-gating pattern is established. The pre-seeded fixture is the only new concept, and it's just a pytest-httpserver fixture that serves a few static files.

11. Stream Lifecycle (`read()` return value)¶

11.1 What the spec requires (SIO-001)¶

From spec 006-streaming-io.md: "The returned stream is not guaranteed to be seekable. Seekability is a backend-level property (e.g. local files are seekable, HTTP-based streams typically are not), not a Store API contract."

The streaming conformance tests (TestStreamingConformance) verify: - Stream is not a BytesIO wrapper (must be a real stream) - Chunked read(size) works - Stream supports context manager protocol - Seekability is NOT tested as a requirement

11.2 urllib.request response as BinaryIO¶

urllib.request.urlopen() returns http.client.HTTPResponse, which: - Inherits from io.BufferedIOBase (not RawIOBase) - Supports: read(size), readline(size), readinto(b), close() - Reports seekable() -> False - Has __enter__/__exit__ (context manager)

11.3 Wrapping with `_ErrorMappingStream`¶

_ErrorMappingStream delegates all I/O to the inner stream and maps exceptions to remote-store errors. It handles imperfect streams gracefully: - seek() returning None (paramiko quirk) -> falls back to tell() - seekable() missing -> returns False - tell() returning None -> returns 0 - close() exceptions -> suppressed

Verdict: _ErrorMappingStream(http_response, ...) works directly.

Do NOT wrap in io.BufferedReader — unlike S3/SFTP backends, the HTTP response is already buffered (BufferedIOBase). Double-buffering would be wasteful and could cause issues.

Return pattern:

def read(self, path: str) -> BinaryIO:
    response = self._transport.get(self._url(path), ...)
    return cast("BinaryIO", _ErrorMappingStream(response.body, self._classify_error, path))

11.4 Connection lifecycle¶

The HTTP connection stays open while the stream is open. This is the same pattern as S3 (s3fs holds the connection) and SFTP (paramiko holds the channel). The stream's close() releases the connection.

For urllib, this means one TCP connection per open stream. For requests/httpx with session pooling, the connection returns to the pool on close.

12. FileInfo Field Mapping from HTTP Headers¶

FileInfo field	HTTP header	Handling when missing
`path`	From request path	Always available
`name`	From path	Always available
`size`	`Content-Length`	`0` if missing (chunked transfer, dynamic content)
`modified_at`	`Last-Modified`	`datetime.min.replace(tzinfo=timezone.utc)` if missing
`checksum`	`ETag`	`None` (optional field)
`content_type`	`Content-Type`	`None` (optional field)
`extra`	All response headers	`{"headers": dict(response.headers)}`

Notes: - Content-Length is absent for chunked responses and some CDNs. Using 0 as a fallback is imperfect — code checking file_info.size == 0 (e.g., skip-empty-file logic, progress bars, ext.transfer pre-allocation) would misinterpret "unknown" as "zero bytes". Since FileInfo.size is int (not Optional[int]), there is no clean sentinel today. The spec should note this as a known limitation that may warrant making size Optional in a future FileInfo revision. - Last-Modified is absent on many static file hosts and CDNs. The sentinel must be timezone-aware (datetime.min.replace(tzinfo=timezone.utc)) because S3 and Azure backends return timezone-aware modified_at values — mixing naive and aware datetimes raises TypeError in user code (sorting, filtering, ext.transfer comparisons). Sorting by modified_at would still place HTTP files at the beginning of any list. datetime(1970, 1, 1, tzinfo=UTC) (Unix epoch) is a more conventional sentinel but carries the same ambiguity. Like size, both fields may warrant Optional treatment in a future FileInfo revision. - ETag maps naturally to checksum — both are opaque identifiers for content versioning. Useful for ext.cache integration.

13. Prior Art — Build vs. Reuse¶

The key question is: can we use an existing library instead of writing our own HTTP backend? The answer is no — but the implementation is small enough that this is fine.

13.1 Why not wrap fsspec `HTTPFileSystem`?¶

fsspec's HTTP support is the closest match. It provides read-only HTTP access, streaming, and even directory listing via HTML parsing.

Why it doesn't fit: - fsspec is a heavyweight dependency (pulls in aiohttp for HTTP). remote-store's core has zero runtime deps. - fsspec's HTTPFileSystem exposes an AbstractFileSystem interface, not our Backend interface. Wrapping it would mean adapting every method — the wrapper would be roughly the same size as a direct implementation. - Its HTML-based directory listing is fragile and not something we'd want. - Its Range-based seeking adds complexity we don't need.

fsspec validates that the concept works, but there's nothing to reuse.

13.2 Why not wrap smart_open?¶

smart_open.open("https://...") gives a streaming reader. But: - It's a single open() function, not a filesystem abstraction. No exists(), get_file_info(), check_health(), or any metadata support. - Wrapping it would provide only read() — we'd still implement everything else ourselves. - It requires requests as a dependency.

13.3 Why not use requests/httpx directly as the backend?¶

We do — that's exactly the transport layer (SS4). The "build" here is the thin Backend adapter (~150 lines) that maps HTTP semantics to remote-store's interface. The actual HTTP work is delegated to urllib/requests/httpx.

13.4 What we learn from prior art¶

Project	Lesson for us
fsspec HTTPFileSystem	Concept is proven. Skip HTML listing and Range seeking.
smart_open	Simple HTTP read adapter has demand. We add metadata + composability on top.
Hugging Face Hub	Domain-specific HTTP access belongs in extensions, not the base backend.
DVC	ETag -> checksum mapping works well. Validates `ext.cache` integration pattern.

13.5 Implementation size estimate¶

The HTTP backend is a thin adapter over standard HTTP libraries. Estimated: - Backend class: ~150 lines (method mapping, error mapping, path handling) - Transport protocol + urllib impl: ~80 lines - requests/httpx transports: ~50 lines each (optional)

This is much smaller than MemoryBackend (~505 lines) and S3Backend (~440 lines). Not a wheel worth importing — simpler to build.

14. Real-World HTTP Endpoint Behavior¶

Tested against representative public endpoints to validate assumptions:

Endpoint	Content-Length	Last-Modified	ETag	HEAD	Redirects
GitHub raw (raw.githubusercontent.com)	Yes	Yes	Yes (weak)	Yes	Yes (1 redirect from github.com)
PyPI simple index (pypi.org)	Yes	—	Yes	Yes	Yes (http->https)
PyPI package files (files.pythonhosted.org)	Yes	Yes	Yes	Yes	—
opendata.swiss (lindas API)	Varies	—	—	Yes	—
CDN-hosted static files (typical)	Yes	Yes	Yes	Yes	—

Findings: - HEAD is universally supported — exists() and get_file_info() are safe. - Content-Length is present for static files, sometimes missing for API responses. - Last-Modified is often missing on API endpoints and CDNs. - ETag is common on static file servers, rare on dynamic APIs. - Redirects are common (http->https, domain aliases). Following redirects is mandatory.

15. Risk Assessment¶

Risk	Likelihood	Impact	Mitigation
urllib can't produce conformant streams	Low	High	Verified: `_ErrorMappingStream` wraps HTTPResponse correctly (SS11)
Conformance suite changes break other backends	Low	Medium	Changes are additive (capability gates); existing backends unaffected
Scope creep toward WebDAV/write support	Medium	Medium	Hard boundary: backend name is `http`, not `webdav`; no write methods
`Content-Length` missing breaks FileInfo	Low	Medium	Use `size=0` fallback; document as known limitation (see §12)
Connection leak from unclosed streams	Medium	Medium	Same risk as S3/SFTP; `_ErrorMappingStream.close()` handles cleanup
urllib SSL issues on older Python	Low	Low	`ssl.create_default_context()` works on Python 3.10+

No showstoppers identified. The urllib streaming concern (P1.4 in the original gap analysis) is resolved — it works.

16. Testing Strategy¶

16.1 Backend-specific tests (`tests/backends/test_http.py`)¶

ID	Test	Spec
HTTP-001	`read()` returns streaming BinaryIO, chunked read works	SIO-001
HTTP-002	`read_bytes()` returns full content	BE-007
HTTP-003	`exists()` returns True for 200, False for 404	BE-004
HTTP-004	`get_file_info()` maps headers to FileInfo fields	BE-016
HTTP-005	`get_file_info()` handles missing Content-Length/Last-Modified	BE-016
HTTP-006	Error mapping: 401->PermissionDenied, 404->NotFound, 500->BackendUnavailable	ERR-*
HTTP-007	`native_path()` returns full URL	NPR-003
HTTP-008	`to_key()` strips base_url prefix	NPR-003
HTTP-009	Path with special characters is URL-encoded	—
HTTP-010	Custom headers are sent with every request	—
HTTP-011	Redirects are followed (up to limit)	—
HTTP-012	Timeout raises BackendUnavailable	—
HTTP-013	`check_health()` sends HEAD to base_url	BE-020
HTTP-014	Write/delete/move/copy raise CapabilityNotSupported	—
HTTP-015	`close()` is callable, releases transport	BE-020
HTTP-016	Transport auto-detection (urllib/requests/httpx)	—
HTTP-017	`is_folder()` always returns False	BE-005

16.2 Conformance suite participation¶

After adding capability gates (SS10.3), the HTTP backend runs through the shared conformance suite. Expected results:

~12 tests pass (identity 5, lifecycle 1, to_key 3, unwrap 1, native_path 2)
~48 tests need capability gates (60 ungated minus 12 pass)
9 tests already gated and skip (ATOMIC_WRITE 7 + GLOB 2)
Total: 12 + 48 + 9 = 69 ✓
0 tests fail

16.3 Test infrastructure¶

Use pytest-httpserver (lightweight, no external deps) to create a local HTTP server in fixtures. Pre-seed with test files for read/metadata tests.

No real network calls in CI.

17. Implementation Checklist (SDD Pipeline)¶

Step	Item	Notes
1	Write spec `sdd/specs/032-http-backend.md`	Capability profile, method mapping, error mapping
2	Add capability gates to conformance suite	~15 lines, prerequisite for step 5
3	Implement `ReadOnlyHttpBackend`	`src/remote_store/backends/_http.py`
4	Implement `UrllibTransport`	Same file or `_http_transport.py`
5	Register in backend registry	`backends/__init__.py`, `from_dict()` support
6	Write backend-specific tests	`tests/backends/test_http.py`
7	Run conformance suite with HTTP backend	Verify skip/pass/fail counts
8	Add `RequestsTransport`, `HttpxTransport`	Optional extras
9	Add optional extras to `pyproject.toml`	`[http]` or `[httpx]` group
10	Add docs: guide, API ref, examples	`docs-src/guides/`, `docs-src/api/`
11	Update CHANGELOG, BACKLOG	Per repo conventions

18. Resolved Questions¶

Questions from the original draft, now resolved with reasoning:

Q1. Backend name: "http" or "http-readonly" or "web"?

Use "http". Reasons: - Consistent with other backend names ("local", "s3", "sftp", "azure") — none encode capabilities in the name. - A future WebDAV backend would use "webdav", not "http" — different protocol, different backend. - The capability system communicates what the backend can do; the name identifies the protocol.

Q2. Should is_folder() always return False?

Yes. HTTP has no folder concept. Without LIST, there are no known prefixes to check against. If manifest-based LIST is added later, is_folder() can check whether a path is a known prefix in the manifest.

Q3. Streaming vs. buffered reads?

Streaming. Consistent with all other backends. urllib's HTTPResponse supports chunked read(size) and is already buffered (BufferedIOBase). Wrap in _ErrorMappingStream directly, no BufferedReader needed. Non-seekable per SIO-001 spec allowance.

Q4. Retry policy?

Accept the existing RetryPolicy in the constructor (like S3, SFTP, Azure do). Map its fields to urllib/requests/httpx retry mechanisms: - urllib: Implement a simple retry loop around urlopen(), respecting max_attempts, backoff_base, backoff_max, jitter, and timeout. Retry on transient HTTP statuses (429, 500, 502, 503, 504) and connection errors. Honour Retry-After header when present. - requests/httpx: Delegate to urllib3.Retry / httpx transport retry config, mapping RetryPolicy fields to native parameters.

Q5. Extra dependency group name?

pip install remote-store[httpx] for httpx, pip install remote-store[requests] for requests. No [http] group — the baseline (urllib) needs no extra deps. This mirrors how [arrow] means "install PyArrow" and [otel] means "install OpenTelemetry".

Q6. Conformance suite changes?

Needed but small. Add capability gates to ~10 test classes (same pattern as existing ATOMIC_WRITE/GLOB gates). Pre-seed read-only test data via pytest-httpserver fixture. See SS10.3 for details.

19. Recommendation¶

Proceed with a spec. No showstoppers found:

urllib streaming works with _ErrorMappingStream (verified).
Conformance suite changes are small and additive.
The capability system handles read-only gracefully.
Real-world HTTP endpoints behave as expected (HEAD, Content-Length, redirects).
Prior art (fsspec, smart_open, DVC) validates the concept.
Composability with ext.cache and ext.transfer delivers clear value.

Next steps: 1. Write spec 032-http-backend.md 2. Add conformance suite capability gates (prerequisite, benefits all future partial-capability backends) 3. Implement Phase 1 (urllib-only, {READ, METADATA})

20. Implementation Plan¶

Formerly sdd/plans/plan-readonly-http-backend.md (consolidated).

Branch: claude/plan-readonly-http-backend-YhBCx

Phases¶

Phase	Scope	Notes
0	Conformance suite capability gates	Prerequisite — make suite safe for partial-capability backends
1	Spec `032-http-backend.md`	Formalize design as testable specification
2	`ReadOnlyHttpBackend` + `UrllibTransport` + registration	Zero new runtime deps
3	Backend-specific tests + conformance fixture	17 HTTP-specific scenarios via `pytest-httpserver`
4	Optional transports (requests, httpx) + extras	Higher-quality HTTP for users with these installed
5	Docs: guide, example, README, nav, CHANGELOG, BACKLOG	User-facing documentation

Commit strategy¶

Each commit passes hatch run all independently. One commit per phase.

Dependencies & test infrastructure¶

Dependency	Type	Purpose
`pytest-httpserver`	test-only	Mock HTTP server for backend tests + conformance fixture
`requests>=2.25.0`	optional extra	`RequestsTransport`
`httpx>=0.24.0`	optional extra	`HttpxTransport`

Risks & mitigations¶

Risk	Mitigation
Conformance gate changes affect other backends	Additive only (skip where unsupported); run full suite
urllib streaming edge cases	Research verified `_ErrorMappingStream` wraps HTTPResponse correctly
`size=0` when Content-Length missing	Document as known limitation in spec and guide