Audit 008 — Package Bugs (User-Facing)¶
Date: 2026-04-02
Scope: All source code under src/remote_store/ — core store, backends,
extensions, async API, configuration.
Method: Systematic code review of all 52 source files, followed by
executable reproduction scripts for each candidate bug. Only findings
with confirmed failing reproductions are included.
Summary¶
| ID | Component | Severity | Title |
|---|---|---|---|
| B-1 | ext/cache |
High | CachedStore: write doesn't invalidate parent directory metadata |
| B-2 | ext/cache |
High | CachedStore.child() creates isolated cache — cross-store coherence lost |
| B-3 | _config |
Medium | RegistryConfig.from_dict crashes on YAML/TOML null options |
| B-4 | _config |
Medium | RegistryConfig.from_dict silently converts null to string "None" |
| B-5 | ext/partition |
Low | partition_path allows = in key — round-trip with parse_partition fails |
B-1 — CachedStore: write doesn't invalidate parent directory metadata¶
File: src/remote_store/ext/cache.py lines 288–292 (_invalidate_path)
Severity: High
Description¶
When writing a file at a nested path (e.g. dir/file.txt), the backend
implicitly creates intermediate directories. CachedStore._invalidate_path
only invalidates cache entries for the exact path written, not for any
parent directories. This leaves exists(), is_file(), and is_folder()
cache entries for parent paths stale.
Reproduction¶
from remote_store.backends._memory import MemoryBackend
from remote_store._store import Store
from remote_store.ext.cache import cache
store = Store(MemoryBackend())
cached = cache(store, ttl=300)
# Cache "newdir" as non-existent
assert cached.exists("newdir") == False # cached
# Write creates "newdir" implicitly
cached.write("newdir/file.txt", b"hello", overwrite=True)
# BUG: returns stale False — should be True
assert cached.exists("newdir") == False # stale!
assert store.exists("newdir") == True # actual truth
Execution result¶
exists("newdir") before write: False
exists("newdir") after write: False ← stale
is_folder("newdir") after write: False ← stale
Root cause¶
_invalidate_path(path) at line 288 iterates _PATH_PREFIXES and deletes
(prefix, path) — only the leaf path — then calls _invalidate_listings()
(line 292), which clears all listing/glob cache entries. Listing operations
like list_files("newdir") therefore return fresh results. However, per-path
metadata (exists, is_folder, is_file) for parent segments (e.g.
"newdir" when writing "newdir/file.txt") is not invalidated.
B-2 — CachedStore.child() creates isolated cache¶
File: src/remote_store/ext/cache.py lines 443–451 (_wrap_child)
Severity: High
Description¶
CachedStore._wrap_child() creates a new CachedStore with
cache_backend=None, meaning a fresh MemoryCache. The child and parent
have completely independent caches. Writing through the child does not
invalidate the parent's cached entries, so the parent returns stale data.
Reproduction¶
from remote_store.backends._memory import MemoryBackend
from remote_store._store import Store
from remote_store.ext.cache import cache
store = Store(MemoryBackend())
cached_store = cache(store, ttl=300)
cached_store.write("sub/file.txt", b"version1", overwrite=True)
content1 = cached_store.read_bytes("sub/file.txt") # cached
child = cached_store.child("sub")
child.write("file.txt", b"version2", overwrite=True) # child cache only
# BUG: parent returns stale cached content
content2 = cached_store.read_bytes("sub/file.txt")
assert content2 == b"version1" # stale!
assert store.read_bytes("sub/file.txt") == b"version2" # truth
Execution result¶
Parent reads: b'version1'
Parent reads after child write: b'version1' ← stale
Actual value in store: b'version2'
Root cause¶
_wrap_child at line 444 passes cache_backend=None, which triggers
MemoryCache() construction at line 240. The parent's cache instance is
never shared or linked.
B-3 — RegistryConfig.from_dict crashes on YAML/TOML null options¶
File: src/remote_store/_config.py line 246 and 270
Severity: Medium
Description¶
When a YAML or TOML config file specifies options: with no value (producing
Python None), from_dict calls dict(None) or dict(prof.get("options", {}))
where the get returns None (key exists with null value). This raises an
unhelpful TypeError.
Reproduction¶
from remote_store._config import RegistryConfig
config_data = {
"backends": {
"mybackend": {
"type": "memory",
"options": None, # YAML "options:" with no value
}
},
"stores": {},
}
RegistryConfig.from_dict(config_data)
# TypeError: 'NoneType' object is not iterable
Execution result¶
Root cause¶
Line 246: options = dict(cfg.get("options", {})). When the key "options"
exists with value None, .get("options", {}) returns None (the key is
present), and dict(None) raises TypeError. The same pattern exists at
line 270 for store profiles.
B-4 — RegistryConfig.from_dict silently converts null to string "None"¶
File: src/remote_store/_config.py lines 257, 268–269
Severity: Medium
Description¶
str(None) produces the string "None" in Python. When YAML/TOML config
files contain null values for type, backend, or root_path, from_dict
silently converts them to the literal string "None" instead of raising a
validation error. This causes files to be stored under a None/ prefix or
backend lookups to fail with a confusing "unknown backend type 'None'" error.
Reproduction¶
from remote_store._config import RegistryConfig
from remote_store.backends._memory import MemoryBackend
from remote_store._store import Store
config = RegistryConfig.from_dict({
"backends": {"b": {"type": "memory"}},
"stores": {"s": {"backend": "b", "root_path": None}},
})
print(repr(config.stores["s"].root_path)) # 'None'
# Using this profile corrupts paths:
store = Store(MemoryBackend(), root_path=config.stores["s"].root_path)
store.write("file.txt", b"data", overwrite=True)
# File stored at "None/file.txt" instead of "file.txt"
Execution result¶
Root cause¶
str() coercion at lines 257 (str(cfg["type"])), 268 (str(prof["backend"])),
and 269 (str(prof.get("root_path", ""))) treats None as a valid input
instead of validating that required fields are strings.
B-5 — partition_path allows = in key — round-trip fails¶
File: src/remote_store/ext/partition.py lines 65–76
Severity: Low
Description¶
partition_path validates that = is not in the partition value (line
73–74) but does not validate the partition key. A key containing =
produces a segment like col=x=val with two = signs, which
parse_partition cannot parse (it requires exactly one = per segment).
Reproduction¶
from remote_store.ext.partition import partition_path, parse_partition
path = partition_path("data.parquet", **{"col=x": "val"})
# 'col=x=val/data.parquet'
parsed = parse_partition(path)
print(parsed.partitions) # {} — key lost
print(parsed.filename) # 'col=x=val/data.parquet' — entire path is filename
Execution result¶
partition_path output: 'col=x=val/data.parquet'
Partitions recovered: {}
Filename recovered: 'col=x=val/data.parquet'
Round-trip: FAILED
Root cause¶
Line 73 checks if "=" in str_value for values but there is no corresponding
check for keys. _is_partition_segment at line 122–124 requires
segment.count("=") == 1, so any key with = produces unparseable output.
Notes¶
-
Azure
start_copy_from_urlrace condition: The non-HNS code paths inbackends/_azure.pylines 661–662 (move) and 681 (copy) callstart_copy_from_url()without waiting for the async server-side copy to complete before deleting the source (move) or returning (copy). This is a real data-loss risk for large blobs or cross-account copies but cannot be reproduced without an Azure environment. -
ChecksumWriter / ProgressWriter partial-write semantics: Both wrappers in
ext/streams.pyupdate hash/callback based on input data length rather than actual bytes written. This is explicitly documented as a known limitation ("Assumes buffered I/O semantics"). Users wrappingRawIOBasestreams will get incorrect checksums/progress. Noted here for completeness.