Caching Guide¶
ext.cache wraps a Store in a caching proxy that reduces backend
round-trips for read-heavy and metadata-heavy workloads.
Quick Start¶
from remote_store import Store, cache
from remote_store.backends import MemoryBackend
store = Store(MemoryBackend())
store.write("config.json", b'{"key": "value"}')
# Wrap with a 5-minute cache
cached = cache(store, ttl=300)
# First call hits the backend
data = cached.read_bytes("config.json")
# Second call returns from cache (no backend I/O)
data = cached.read_bytes("config.json")
# Writes automatically invalidate the cache
cached.write("config.json", b'{"key": "new"}', overwrite=True)
# Next read goes to the backend again
data = cached.read_bytes("config.json") # b'{"key": "new"}'
What Gets Cached¶
| Operation | Cached? | Notes |
|---|---|---|
exists() |
Yes | Including False results |
is_file() |
Yes | |
is_folder() |
Yes | |
read_bytes() |
Yes | Subject to max_content_size |
get_file_info() |
Yes | |
get_folder_info() |
Yes | |
list_files() |
Yes | Materialized on first call; subject to max_listing_size |
list_folders() |
Yes | Materialized on first call; subject to max_listing_size |
iter_children() |
Yes | Materialized on first call; subject to max_listing_size |
glob() |
Yes | Materialized on first call; subject to max_listing_size |
read() |
— | Returns BinaryIO stream |
read() is deliberately not cached because it returns a BinaryIO
stream that may be lazily consumed. Use read_bytes() when you want
cached content reads.
Automatic Invalidation¶
Mutating operations automatically invalidate affected cache entries:
write,write_atomic,open_atomic— invalidate the written path and all listing/folder caches.delete— invalidate the deleted path and all listings.delete_folder— clear the entire cache (folder deletion can affect any cached path).move— invalidate both source and destination paths plus listings.copy— invalidate destination path plus listings.
Limiting Memory Usage¶
By default, read_bytes() caches files of any size. For workloads with
large files, set max_content_size to prevent memory pressure:
Files larger than the limit are still returned correctly — they just bypass the cache.
Similarly, listing operations (list_files, list_folders,
iter_children, glob) cache their full result set. For stores with
very large directories, set max_listing_size to skip caching listings
that exceed a given item count:
# Cache listings up to 500 items; larger ones bypass the cache
cached = cache(store, ttl=300, max_listing_size=500)
Both limits can be combined:
To limit the total number of cache entries regardless of type, use
max_entries. When exceeded, the least-recently-used entry is evicted:
Cache Statistics¶
Manual Invalidation¶
# Invalidate a specific path
cached.invalidate("config.json")
# Clear the entire cache
cached.clear_cache()
Stale Data¶
The cache cannot detect writes made by other processes or other Store instances sharing the same backend. If your workload involves external mutations, either:
- Set a short TTL (e.g.,
ttl=10). - Call
invalidate(path)orclear_cache()when you know external writes occurred.
Composing with Observability¶
CachedStore composes with ObservedStore. The ordering determines
what gets observed:
from remote_store import observe, cache
# Observe only cache misses (actual backend calls)
cached = cache(observe(store, on_read=my_hook), ttl=300)
# Observe all reads including cache hits
observed = observe(cache(store, ttl=300), on_read=my_hook)
Thread Safety¶
CachedStore and MemoryCache are thread-safe. They work correctly
with batch_exists(concurrent=True) and similar concurrent access
patterns.