Config Loaders Specification¶
Overview¶
RegistryConfig gains two file-based loaders — from_toml() and from_yaml() —
that are thin translation layers over from_dict(). Both produce identical
RegistryConfig objects for equivalent input. The core config model does not change.
Backlog items: ID-005 (from_toml), ID-002 (from_yaml), ID-003 (Pydantic adapter)
Research: sdd/research/research-store-config.md
Constraint: ADR-0002 (no merging) — loaders are pre-processing steps that
produce a single immutable RegistryConfig.
TOML Loader¶
CFG-008: from_toml()¶
Invariant: RegistryConfig.from_toml(path, *, table=()) reads a TOML file
and returns a RegistryConfig.
Parameters:
- path: str | Path — Path to the TOML file.
- table: tuple[str, ...] = () — Dotted table path to extract. For
pyproject.toml, use table=("tool", "remote-store").
Behavior:
1. Parse the file via tomllib (3.11+) or tomli (3.10 backport).
2. If table is non-empty, traverse into the nested table.
3. Delegate to from_dict() (inherits Secret wrapping, validation).
Raises:
- ModuleNotFoundError if tomllib is unavailable and tomli is not installed.
Message includes install instructions: pip install 'remote-store[toml]'.
- KeyError if a table key is not found in the parsed data.
- FileNotFoundError if path does not exist.
- tomllib.TOMLDecodeError if the file is not valid TOML.
Postconditions: The returned RegistryConfig is identical to calling
from_dict() on the parsed TOML dict (after table traversal).
CFG-009: TOML Dependency Shim¶
Invariant: On Python 3.11+, from_toml() uses the stdlib tomllib with
zero runtime dependencies. On Python 3.10, the tomli backport is required
(available via the toml optional extra).
YAML Loader¶
CFG-010: from_yaml()¶
Invariant: from_yaml(path) (in ext/yaml.py) reads a YAML file and
returns a RegistryConfig.
Location: remote_store/ext/yaml.py
Parameters:
- path: str | Path — Path to the YAML file.
Behavior:
1. Parse the file via yaml.safe_load (pyyaml) or ruamel.yaml safe parser.
2. Validate the top-level value is a dict.
3. Delegate to from_dict() (inherits Secret wrapping, validation).
Raises:
- ModuleNotFoundError if neither pyyaml nor ruamel.yaml is installed.
Message includes install instructions: pip install 'remote-store[yaml]'.
- FileNotFoundError if path does not exist.
- TypeError if the top-level YAML value is not a mapping.
- yaml.YAMLError (pyyaml) or ruamel.yaml.YAMLError (ruamel) if the file
is not valid YAML.
Design note: No key/table parameter. YAML has no shared-file convention
like pyproject.toml. Users with nested YAML use
yaml.safe_load(f)["key"] → from_dict(). A key parameter can be added later
without breaking changes if demand emerges.
CFG-011: YAML Library Precedence¶
Invariant: from_yaml() prefers pyyaml (yaml.safe_load). If pyyaml
is not installed, it falls back to ruamel.yaml (safe mode). If neither is
available, ModuleNotFoundError is raised.
Rationale: pyyaml is ubiquitous (~300M downloads/month) and simpler.
ruamel.yaml is a viable alternative but heavier. Accepting both avoids
forcing a specific library on users who already have one installed.
Cross-Cutting¶
CFG-012: Unknown Top-Level Keys Warning¶
Invariant: from_dict() emits a UserWarning for top-level keys other
than "backends" and "stores". This catches typos like "backend" (singular)
or "store" that would otherwise silently produce an empty config.
Behavior: Uses warnings.warn() with stacklevel adjusted so the
warning source points to user code. Direct from_dict() calls use
stacklevel=2; indirect calls via from_toml(), from_yaml(), and
from_pydantic() use stacklevel=3. Does not raise.
CFG-013: Loader Equivalence¶
Invariant: from_toml(), from_yaml(), from_dict(), and
from_pydantic() produce identical RegistryConfig objects for
semantically equivalent input. All Secret wrapping, type coercion, and
validation happens in from_dict() — the loaders/adapters are pure
format-to-dict translators.
CFG-014: Optional Extras¶
Invariant: pyproject.toml declares optional extras:
- toml: ["tomli>=1.1.0; python_version < '3.11'"]
- yaml: ["pyyaml>=5.1"]
- pydantic: ["pydantic-settings>=2.0.0"]
Pydantic Adapter¶
CFG-015: from_pydantic()¶
Invariant: from_pydantic(model) converts any Pydantic
BaseModel instance to a RegistryConfig.
Location: remote_store/ext/pydantic.py
Parameters:
- model: BaseModel — A Pydantic model whose model_dump() output conforms
to the RegistryConfig schema (i.e. has backends and stores keys).
Behavior:
1. Call model.model_dump() to produce a plain dict.
2. Delegate to RegistryConfig.from_dict() (inherits Secret wrapping,
unknown-key warning, validation).
Raises:
- ModuleNotFoundError if pydantic is not installed. Message includes
install instructions: pip install 'remote-store[pydantic]'.
- Any exception from from_dict() (e.g. TypeError, KeyError).
Postconditions: The returned RegistryConfig is identical to calling
RegistryConfig.from_dict(model.model_dump()).
SecretStr handling: Pydantic SecretStr fields in backend options
dicts are automatically unwrapped to plain strings before reaching
from_dict(), so from_dict()'s sensitive-key detection works correctly.
Users may use either str or SecretStr for credential values in their
models.
CFG-016: ADR-0002 Compatibility¶
Invariant: The Pydantic adapter operates entirely on the user side.
Pydantic's source merging (env vars, .env files, config files) happens
before from_pydantic() is called. The resulting
RegistryConfig is immutable and subject to ADR-0002 (no further merging).
Flow:
User's Pydantic model (merges env + .env + files)
→ from_pydantic() → from_dict()
→ RegistryConfig (immutable, ADR-0002 applies)
CFG-017: Extension Contract¶
Invariant: ext/pydantic.py follows the extension architecture (ADR-0008):
- Defines __all__.
- Uses only the public RegistryConfig.from_dict() API.
- Imported directly from remote_store.ext.pydantic (ADR-0013).
- Import of pydantic is guarded at module level with a clear error message.
Environment Variable Interpolation¶
Backlog item: ID-126
Motivation: Every backend that touches a remote system needs credentials.
Outside Pydantic, the dominant real-world pattern is env-var injection — but
from_yaml() and from_toml() offer no help, forcing users to write 10–15
lines of boilerplate (parse file → mutate dict → inject from os.environ →
call from_dict()). The Pydantic adapter solves this via BaseSettings, but
YAML/TOML users deserve the same ergonomics.
CFG-018: resolve_env() Function¶
Invariant: resolve_env(data, *, environ=None) recursively resolves
${VAR} placeholders in a config dict and returns a new dict.
Location: _config.py, public export via remote_store.
Parameters:
- data: dict[str, object] — Config dict (typically parsed from YAML/TOML).
- environ: Mapping[str, str] | None — Variable source. Defaults to
os.environ.
Returns: A deep copy of data with all placeholder strings resolved. The original dict is never mutated.
Raises:
- KeyError if a placeholder references a variable that is not set and has
no default. The error message includes the variable name and the config
key path where it was found.
Postconditions:
- Non-string values (numbers, booleans, None, nested dicts, lists) are
traversed but never modified.
- Keys are never interpolated — only values.
- The returned dict is suitable for RegistryConfig.from_dict().
CFG-019: Placeholder Syntax¶
Invariant: Two placeholder forms are supported:
| Form | Behavior |
|---|---|
${VAR} |
Substitute the value of VAR; raise KeyError if unset |
${VAR:-default} |
Substitute the value of VAR; use default if unset |
Rules:
1. Placeholders may appear as the entire value (${S3_KEY}) or embedded in
a larger string (https://${HOST}:${PORT}/path).
2. Multiple placeholders in one string are resolved left-to-right.
3. A full-value placeholder that resolves to a string remains a string — no
type coercion. The YAML/TOML parser already determined the type by using
the ${} syntax (which is always a string).
4. Literal ${ that should not be interpolated can be escaped as $${
(produces literal ${ in output).
5. Nested placeholders (${${INNER}}) are not supported.
6. Default values are literal strings — they do not undergo further
interpolation.
Syntax reference: Follows the Docker Compose ${VAR} / ${VAR:-default}
convention — the de facto standard across Docker, Spring Boot, GitHub Actions,
and shell parameter expansion.
CFG-020: Loader Integration¶
Invariant: from_yaml() and from_toml() accept an optional
resolve_env_vars: bool = False keyword argument.
Behavior:
- When resolve_env_vars=False (default): no change to current behavior.
- When resolve_env_vars=True: the parsed dict is passed through resolve_env()
before delegation to from_dict().
Flow (YAML example):
YAML file → yaml.safe_load() → resolve_env() → from_dict()
→ RegistryConfig (immutable, ADR-0002 applies)
Design note: from_dict() does not gain a resolve_env_vars parameter.
It accepts already-constructed dicts where interpolation would be surprising.
Users who build dicts manually and want interpolation call resolve_env()
explicitly.
CFG-021: ADR-0002 Compatibility¶
Invariant: resolve_env() is a pre-processing step that runs before
RegistryConfig construction. It occupies the same architectural position as
Pydantic's BaseSettings env-var resolution (CFG-016): user-side glue that
produces a single, final dict. Once the RegistryConfig is constructed,
ADR-0002 applies — no further merging or env-var lookups.
Opt-in only: The default is resolve_env_vars=False. No loader reads
environment variables unless the caller explicitly opts in. This preserves
determinism, test safety, and the "same code = same behavior" guarantee.
No .env loading: resolve_env() reads from os.environ (or the
provided environ mapping). Loading .env files is the user's responsibility
(e.g. via python-dotenv). This keeps the function pure and predictable.
No vault integration: Secret managers (HashiCorp Vault, AWS Secrets
Manager, Azure Key Vault) populate env vars or provide their own SDKs.
resolve_env() consumes the result — it does not integrate with any
specific vault provider.
File Placement¶
| Component | Location |
|---|---|
from_toml() |
_config.py classmethod on RegistryConfig |
from_yaml() |
ext/yaml.py standalone function |
resolve_env() |
_config.py standalone function, public export |
| Unknown-key warning | _config.py inside from_dict() |
| Pydantic adapter | ext/pydantic.py |
| Optional extras | pyproject.toml [project.optional-dependencies] |
Example TOML¶
# remote-store.toml
[backends.local]
type = "local"
options.root = "/data/store"
[backends.s3-prod]
type = "s3"
options.bucket = "prod-data"
options.region_name = "eu-central-1"
[stores.raw-events]
backend = "s3-prod"
root_path = "events/raw"
[stores.local-cache]
backend = "local"
root_path = "cache"
config = RegistryConfig.from_toml("remote-store.toml")
# From pyproject.toml:
config = RegistryConfig.from_toml("pyproject.toml", table=("tool", "remote-store"))
Example YAML¶
# remote-store.yaml
backends:
s3-prod:
type: s3
options:
bucket: prod-data
region_name: eu-central-1
stores:
raw-events:
backend: s3-prod
root_path: events/raw
Example Pydantic¶
from pydantic import BaseModel
from pydantic_settings import BaseSettings, SettingsConfigDict
from remote_store.ext.pydantic import from_pydantic
class BackendEntry(BaseModel):
type: str
options: dict[str, object] = {}
class StoreEntry(BaseModel):
backend: str
root_path: str = ""
options: dict[str, object] = {}
class RemoteStoreSettings(BaseSettings):
model_config = SettingsConfigDict(
env_prefix="RS_",
env_nested_delimiter="__",
)
backends: dict[str, BackendEntry] = {}
stores: dict[str, StoreEntry] = {}
settings = RemoteStoreSettings()
config = from_pydantic(settings)
Example: YAML with Environment Variable Secrets¶
# remote-store.yaml
backends:
s3-prod:
type: s3
options:
bucket: prod-data
region_name: eu-central-1
key: ${AWS_ACCESS_KEY_ID}
secret: ${AWS_SECRET_ACCESS_KEY}
sftp:
type: sftp
options:
host: files.vendor.com
username: ${SFTP_USER}
password: ${SFTP_PASSWORD:-}
stores:
raw-events:
backend: s3-prod
root_path: events/raw
from remote_store.ext.yaml import from_yaml
# One line — env vars resolved, secrets wrapped, config immutable
config = from_yaml("remote-store.yaml", resolve_env_vars=True)
Example: TOML with Environment Variable Secrets¶
# remote-store.toml
[backends.s3-prod]
type = "s3"
options.bucket = "prod-data"
options.key = "${AWS_ACCESS_KEY_ID}"
options.secret = "${AWS_SECRET_ACCESS_KEY}"
[stores.raw-events]
backend = "s3-prod"
root_path = "events/raw"