RFC-0012: Documentation Graph Model¶
Status¶
Accepted
Summary¶
Define a machine-readable intermediate representation (IR) for the remote-store
API surface: a typed graph of nodes and edges derived from source code via static
analysis, serialized as JSON, and projected to multiple documentation outputs
(FEATURES.md capability matrix, per-class API reference stubs, extras index).
The graph is the single source of truth for any derived doc artefact; FEATURES.md
becomes a projection of it rather than a hand-maintained file. Versioned snapshots
replace in-node since: / deprecated_in: tracking.
Motivation¶
Two documentation surfaces share the same drift problem:
FEATURES.md (ID-159) is a hand-maintained capabilities snapshot. The
capability matrix, method list, and extras table are all mechanically derivable
from source.
The API reference (docs/api/store/, docs/api/aio/, per-backend pages)
has the same issue: method tables, capability gates, and backend-conditional
parameter admonitions are written by hand against the source and drift silently
as the API evolves.
Both surfaces expose the same cross-cutting relationships — which capability gates which method, which extra enables which backend — but currently each is authored independently with no shared extraction logic.
Without a shared IR:
FEATURES.md, the API reference, and the extras index each duplicate the same traversal logic.- Cross-cutting relationships —
cap:WRITEgatesStore.write,xtr:s3enablesS3Backend— are not queryable; they live in prose. - Diff between releases requires diffing prose documents, not structured data.
This RFC defines the graph IR. Implementation (the loader, projection scripts, and release-skill integration) is tracked under ID-159.
Goals¶
- One typed graph; all derived doc artefacts are projections of it.
- The graph is queryable in plain Python; no SPARQL, no property-graph DB.
- Cross-cutting edges (
capability → method,extra → backend,sync ↔ async mirror) are first-class, not buried in prose. - Versioned snapshots replace
since:tracking; git-diff two snapshots to produce a changelog. - JSON serialization is byte-stable across regenerations of the same source
tree (suitable for
git diff). - The graph accommodates conditional capabilities (ID-140: per-instance, dialect-conditional capabilities) without a schema change.
Non-goals¶
- Generating narrative prose. Curated sections of FEATURES.md remain hand-maintained (region-tagged).
- Replacing the mkdocstrings HTML renderer. The API reference is still rendered by mkdocstrings; the graph produces structured stubs and admonition metadata, not raw HTML.
- Storing type-inference results. Types are recorded as opaque
TypeRefnode labels, not inferred by a type checker. - Querying across multiple versions. The consumer loads two snapshots and diffs them; the IR itself is single-version.
Proposal¶
The graph: two collections¶
The IR is two sorted arrays: nodes and edges. Everything is
addressable by a stable URI string. No separate adjacency structure — queries
iterate the arrays.
{
"schema_version": "1.2",
"source_version": "0.24.0",
"snapshot": "0.24.0",
"nodes": [ ... ],
"edges": [ ... ]
}
schema_version and source_version are independent fields. A schema bump
(new node kinds, renamed edge kinds, or new properties on an existing node kind)
increments schema_version; a new package release increments source_version.
snapshot mirrors source_version (both track pyproject.toml[project][version]).
Promote-to-node rule¶
If a value is referenced from more than one node, or a natural query is "which
X have property Y", it is a node. Closed enumerations that only describe their
parent (booleans, sync/async, abc/backend/facade) stay as properties.
Node taxonomy¶
| Kind | URI prefix | Key properties | Role property |
|---|---|---|---|
package |
pkg: |
runtime (sync/async), version |
— |
module |
mod: |
file |
— |
class |
cls: |
role, runtime, file, line, summary |
abc · backend · facade · extension · data · enum · error · helper |
method |
mtd: |
summary, is_abstract, is_async, file, line |
— |
capability |
cap: |
summary, semantics |
— |
data_model |
dm: |
frozen, summary |
— |
field |
fld: |
default, summary |
— |
error |
err: |
when_raised, summary |
— |
extra |
xtr: |
kind (backend/extension/aggregate) |
— |
package_dep |
dep: |
min_version |
— |
parameter |
prm: |
has_default, summary |
— |
type_ref |
typ: |
label (opaque string) |
— |
predicate |
prd: |
label (opaque string, e.g. "dialect == 'sqlite'") |
— |
requirement |
req: |
mode (all/any, default all) |
— |
role |
rol: |
label (source/dest/self) |
— |
predicate carries the condition for a conditional declares or raises edge
as an opaque label. The IR records it; the semantic interpretation belongs to
the code. This covers ID-140 (dialect-conditional capabilities) without a schema
change.
requirement is an explicit AND/OR group between a method and its capability
gate(s). All current methods use mode: all with a single capability; the
node exists so the schema does not change when a method needs a conjunction or
disjunction.
Two patterns use multiple req: nodes vs. multiple of edges:
- AND of capabilities (one
req:, Nofedges): the method requires all N capabilities simultaneously on the same code path. - Alternative gates (N
req:nodes, each with its ownofedge): the method chooses one capability gate at runtime based on a condition. Eachreq:node represents one branch. URI convention:<method>.gatefor the primary gate;<method>.gate_<discriminator>for each alternative (e.g..gate_depthfor a depth-limited code path).
Edge taxonomy¶
| Kind | Domain → Range | Attributes | Notes |
|---|---|---|---|
contains |
package/module/class → child | — | Containment tree |
inherits |
class → class | — | DAG |
declares |
backend → capability | condition: prd URI \| null |
null for unconditional |
gates |
requirement → method | — | Via req: group |
of |
requirement → capability | index: int |
Members of the group |
enables |
extra → class/extension | — | pip extra → backend |
requires_dep |
extra → package_dep | — | pip dependency |
mirrors |
class → class | capability_delta: {async_only: [str], sync_only: [str]} |
Canonical direction: async → sync peer (one edge per pair; deduped by generator). Capability lists are sorted; names are anchored to the canonical direction so async_only lists capabilities present on src (async) but absent on dst (sync). |
composes |
extension → class | — | The Store/Backend it wraps |
requires_cap |
extension/role → capability | — | Capability needed by ext/role |
played_by |
extension → role | — | Extension has this role on the edge |
returns |
method → data_model | — | — |
accepts |
method → data_model | param: str |
— |
has_param |
method → parameter | position: int |
— |
typed |
parameter/field → type_ref | — | — |
has_field |
data_model → field | — | — |
raises |
method → error | condition: prd URI \| null |
— |
gates and of together replace a simple "method requires capability" attribute.
A method with one unconditional gate has one req: node linked by a single
of edge to the capability, and one gates edge from that req: to the method.
When a second capability is added, a second of edge joins the same req: node —
no schema change.
Diagram¶
graph LR
xtr -- enables --> cls_backend
cls_backend -. mirrors .- cls_async
cls_backend -- declares --> cap
cap <-- of -- req -- gates --> mtd
mtd -- returns --> dm
dm -- has_field --> fld
fld -- typed --> typ
mtd -- has_param --> prm
prm -- typed --> typ
mtd -- raises --> err
ext -- composes --> cls_backend
ext -- played_by --> rol
rol -- requires_cap --> cap
xtr -- requires_dep --> dep
Worked example: S3Backend neighborhood¶
One-hop walk from cls:remote_store.backends._s3.S3Backend:
cls:...S3Backend
← enables xtr:s3
→ inherits cls:remote_store._backend.Backend
→ declares (×13) cap:READ, cap:WRITE, cap:DELETE, cap:LIST, cap:GLOB,
cap:MOVE, cap:COPY, cap:ATOMIC_WRITE, cap:METADATA,
cap:USER_METADATA, cap:SEEKABLE_READ, cap:LAZY_READ,
cap:WRITE_RESULT_NATIVE
(all condition: null)
↔ mirrors (no async S3 backend exists yet; edge applies once one is added)
mtd:remote_store.Store.write
← gates req:Store.write.gate
req:Store.write.gate
→ of cap:WRITE
mtd:Store.write → returns → dm:remote_store.WriteResult
dm:WriteResult → has_field → fld:WriteResult.etag → typed → typ:str|None
The FEATURES.md capability matrix is a two-hop walk:
def matrix(g):
return {
b: {c: [m for m in methods_gated_by(g, c)]
for c in capabilities_declared_by(g, b)}
for b in g.nodes(kind="class", role="backend")
}
Snapshots¶
One rolling file tracks the current development state. source_version and
snapshot are both set to the pyproject.toml [project][version] at
generation time; gen_graph.py reads this dynamically so no manual update is
needed.
| File | source_version |
snapshot |
|---|---|---|
docs-src/_data/graph/graph.json |
current pyproject version | current pyproject version |
Files live in docs-src/_data/graph/ (git-tracked; mkdocs copies the
directory verbatim).
Release-time flow (Phase 2): after bump-my-version stamps the new version
into pyproject.toml, run hatch run gen-graph to re-stamp graph.json with
the release version before committing. The git tag then freezes the file;
source_version is self-describing for consumers without git context.
Consumer note: between releases graph.json reflects the pyproject version
at the last commit that touched it, not necessarily the in-progress dev state.
Use git history or git tag to identify the exact released snapshot.
Determinism: the serializer must produce byte-identical output for the same source tree. Rules:
nodessorted ascending byidURI.edgessorted ascending by(kind, src, dst).- Object keys sorted (JSON
sort_keys=True). - LF line endings, no trailing whitespace, 2-space indent.
- One golden test: generate twice, assert the files are identical.
Projection design¶
Three layers between source and rendered Markdown:
One projection function per output; all query logic lives in Python, not Jinja:
| Output | Projection input | Primary walk |
|---|---|---|
FEATURES.md capability matrix |
All backend nodes | backend → declares → cap → gates → method |
| Per-backend reference page | One backend node | backend → declares, mirrors, enables |
| Capability × method table | All capability nodes | cap ← of ← req ← gates ← method |
| Extras index | All extra nodes | extra → enables → backend, extra → requires_dep |
The projection returns plain Python dataclasses (view objects); templates render only. This makes the test surface trivial: snapshot the projection output, not the rendered Markdown.
Schema evolution¶
Older snapshots are frozen at their schema_version. The chosen strategy is
read-only legacy: loaders support the lowest common subset across known
schema versions; projections degrade gracefully when a node kind or edge kind
is absent. Forklift-upgrade (re-running the generator against every historical
tag) is deferred until a second schema version is actually needed.
Tooling appendix¶
This section is informational; implementation decisions belong to ID-159.
Loader. Griffe is already loaded by mkdocstrings (configured in
mkdocs.yml). The graph generator reuses Griffe's parse via a Griffe
Extension that fires on on_class_instance to populate declares edges from
each backend's capabilities property, and on on_module to collect extras
from pyproject.toml. Griffe is not the IR; it is the parse layer.
Extras → backend mapping. The mapping is a two-source join:
src/remote_store/backends/__init__.py— eachtry/except ImportErrorblock names the backend class and, implicitly, the package whose absence causes the failure (e.g.from remote_store.backends._s3 import S3Backendfails whens3fsis absent).pyproject.toml[project.optional-dependencies]— maps each pip extra name to the packages it installs.
The generator joins the two by package name: "which extra installs the package
that backends/__init__.py needs for this class?" This requires no
hand-maintained table and stays correct as new backends are added, as long as
both sources are consulted.
Static capability extraction. Today capabilities are declared as
module-level CapabilitySet constants and exposed via a capabilities
property. The lightest annotation that makes them statically extractable without
running the code is a ClassVar annotation directly on the subclass:
This is the recommended precondition for ID-159. It is a small, non-breaking refactor of each backend.
Gating table. Today, capability gating in _store.py is done via inline
.require() calls scattered across each method (see References). The recommended
precondition for ID-159 is to consolidate these into a central _GATING dict
mapping method names to Capability values — making it both the runtime check
source and the static extraction target for gates edges. This is the same
category of precondition as CAPABILITIES: ClassVar on backends.
Alternatives Considered¶
Griffe tree as IR¶
Griffe's object tree is containment-only. Cross-cutting edges (capability gates
method, extra enables backend, sync mirrors async) have no natural home.
Griffe's extra dict can carry annotations per node, but it cannot represent
edges between nodes. The proposed graph is loaded from Griffe, not as
Griffe.
RDF / property-graph database¶
No SPARQL queries are needed. All queries are simple array iterations in
Python. RDF does not survive git diff as cleanly as JSON. Rejected.
docspec (Pydoc-Markdown's IR)¶
A published Python API IR format. Griffe is already in the dependency set (via mkdocstrings); adding docspec would be a second parse layer for no gain. Rejected.
Versioning as a graph axis (since: / deprecated_in: on edges)¶
Adding version fields to every edge grows the schema and the serialization
cost. Per-file snapshots give the same information for free via git diff: a
node present in graph-0.24.0.json and absent in graph-0.23.0.json was
added in 0.24.0. Rejected.
Single rolling file (no per-version snapshots)¶
Adopted (ID-163). The original proposal kept per-version archive files
(graph-X.Y.Z.json) alongside the rolling file. During implementation the
archive step was dropped in favour of simplicity: graph.json is always
stamped with the current pyproject.toml version; the git tag is the
immutable record of the released state. Diffing two releases means checking out
the two tags and comparing graph.json. This trades the convenience of
in-tree snapshot artefacts for a smaller file surface and a simpler release
sequence.
Impact¶
- Public API: none. This RFC defines a build-time artefact, not a runtime surface.
- Backwards compatibility: not applicable.
- Performance: graph generation is a build-time step, not a hot path. The JSON file for the current backend surface is expected to be under 1 MB.
- Testing: one golden test (byte-stable round-trip); snapshot tests for each projection function. No runtime tests.
- Ripple-check: this RFC is design-only. The implementation PR for ID-159
will touch backends (adding
CAPABILITIES: ClassVar),_store.py(adding_GATINGtable),docs-src/_data/graph/,scripts/, the release skill, andFEATURES.md.
Open Questions¶
Sync↔async peer discovery for mirrors edges. Resolved by ID-159: each
async backend carries a __mirror__: ClassVar[type[T]] annotation pointing
to its sync peer. The generator emits one directed edge per pair in the
canonical async → sync direction (deduped). Consumers that need to query from
the sync side must reverse the edge themselves.
Capability asymmetry between mirror peers. Resolved by ID-162 (schema 1.2):
async and sync peers may declare different capability sets (e.g.
AsyncMemoryBackend includes LAZY_READ; MemoryBackend does not). Each
mirrors edge carries a capability_delta object with sorted async_only and
sync_only capability-name lists so consumers can render the asymmetry instead
of treating the peers as equivalent. The lists are always present (empty when
the peers are symmetric).
References¶
- ID-159 (
FEATURES.mdhybrid generation):sdd/BACKLOG.md - ID-140 (dialect-conditional capabilities):
sdd/BACKLOG.md - Backend capabilities:
src/remote_store/backends/_local.py:26,_s3.py:37,_sftp.py:47,_azure.py:47,_sqlalchemy.py:47,_http.py:41 - Capability gating in Store:
src/remote_store/_store.py:68,87,118,189 - Capability enum:
src/remote_store/_capabilities.py - mkdocs + mkdocstrings config:
mkdocs.yml - Existing gen-files script:
scripts/gen_pages.py - Extension architecture:
sdd/adrs/0008-extension-architecture.md - Backend adapter contract:
sdd/specs/003-backend-adapter-contract.md