Research: Example Testing Across Language Ecosystems¶

Date: 2026-03-06 Context: Evaluating how other ecosystems test examples/tutorials to inform our approach for examples/*.py scripts in remote-store.

1. Python Ecosystem¶

Python doctest (stdlib)¶

Code in docstrings prefixed with >>> is extracted and run by doctest. Expected output follows on the next line. Integrated into pytest via --doctest-modules or pytest.ini addopts.

Pros: Zero dependencies, stdlib, appears in help() and Sphinx docs.
Cons: Fragile string comparison, whitespace/repr sensitivity, poor error messages, doesn't scale to complex examples.

pytest-examples (Pydantic)¶

Extracts Python code blocks from markdown files or #> output markers in .py files. Runs them and compares output. --update-examples auto-syncs markers.

content = store.read_bytes("hello.txt")
print(content)
#> b'Hello, world!'

Pros: Output validated, auto-update keeps examples current.
Cons: Extra dependency, #> syntax visible to readers.

xdoctest¶

Drop-in replacement for doctest with better parsing (AST-based), clearer error messages, and support for multi-line statements without ... prefixes.

Sybil¶

Evaluates code blocks in Sphinx/markdown docs. Supports multiple languages and custom parsers. Used by projects needing to test documentation examples.

subprocess smoke tests (what we do now)¶

Run python examples/foo.py and check exit code. Simple, reliable, but only verifies "no crash," not correctness.

2. Rust — `rustdoc` Doc-Tests¶

Code blocks inside /// doc comments are compiled and run by rustdoc. cargo test includes doc-tests automatically.

Lines prefixed with # are compiled but hidden from rendered docs.
Annotations: ignore (skip), no_run (compile only), should_panic.
2024 edition: doc-tests compiled into a single binary for speed.

Key insight: The #-hiding mechanism lets examples look clean while being complete programs. Because they run in CI, examples are guaranteed correct.

Used by: The standard library, serde, regex, tokio — essentially every published crate.

3. Go — `Example` Test Functions¶

Functions named ExampleFoo in _test.go files are compiled and run by go test. A trailing // Output: comment is compared against stdout.

func ExampleReverse() {
    fmt.Println(stringutil.Reverse("hello"))
    // Output: olleh
}

// Unordered output: matches lines in any order.
Examples without // Output: are compiled but not executed.
Examples appear in godoc automatically.

Key insight: The naming convention IS the mechanism — no annotations, no special syntax. The // Output: comment is both assertion and documentation. Arguably the most elegant design of any language.

Tradeoff: Only tests stdout.

4. Elixir — ExUnit Doctests¶

iex> lines in @doc strings are parsed by ExUnit.DocTest. Expected output follows on the next line. doctest MyModule in test files enables them.

Blank lines between iex> blocks create separate test cases.
Exception testing via ** (ExceptionName) prefix.
doctest_file "README.md" runs doctests from markdown files.

Key insight: Explicitly positioned as "documentation that happens to be tested," not "tests that happen to be documentation."

5. Haskell — `doctest` Package¶

>>> lines in Haddock comments run via GHCi. Output on following lines is compared. Examples within a comment share scope.

Pros: Pure REPL transcript approach.
Cons: Slow (GHCi reload per group). doctest-parallel and cabal-docspec exist to address performance.

6. JavaScript/TypeScript¶

No built-in mechanism. Ad-hoc tools exist:

markdown-doctest: Extracts JS blocks from markdown, checks no-throw only.
doctest-ts: // => assertions in TS source translated to test files.
davidchambers/doctest: // > expressions in JS source evaluated.

Major JS projects (Express, axios) generally don't test README examples. Prisma maintains a separate prisma-examples repo with full runnable projects.

Key insight: The lack of a standard mechanism means JS examples drift.

7. Java/Kotlin¶

JEP 413 @snippet tag (JDK 18+): JavaDoc references external .java files that are part of the test suite. Tested code IS documented code.

evitaDB pattern: JUnit 5 DynamicContainer extracts code blocks from markdown, compiles via JShell, runs as parameterized tests.

Key insight: Java's @snippet is architecturally sound — reference by path, no extraction needed.

8. Language-Agnostic Tools¶

LLVM `lit`¶

RUN: lines in comments specify shell commands. FileCheck matches patterns against stdout (CHECK:, CHECK-NEXT:, CHECK-NOT:, CHECK-DAG:). Concurrent execution. XFAIL marks known-broken tests.

Cram Tests¶

.t files with shell transcripts: $ for commands, indented lines for expected output. (re) suffix for regex, (glob) for glob patterns.

Killer feature: cram -i accepts actual output interactively. Adopted by OCaml's Dune as dune promote.

mdBook¶

mdbook test passes Rust code blocks to rustdoc --test. Reuses rustdoc infrastructure for free.

Summary Table¶

Approach	Mechanism	Strength	Weakness
Rust rustdoc	Doc comment code compiled+run	Example IS test	Slow per-example compilation
Go Examples	Naming convention + `// Output:`	Elegant, zero config	Stdout-only
Elixir doctest	`iex>` in `@doc` strings	First-class in ExUnit	No sandbox isolation
Haskell doctest	`>>>` in Haddock, GHCi	Pure REPL transcript	Slow
JS markdown-doctest	Extract from .md	Better than nothing	No-crash only
Java `@snippet`	JavaDoc refs tested files	Clean architecture	JDK 18+
Cram	Shell transcripts in `.t`	Interactive accept workflow	CLI-only
pytest-examples	`#>` markers in .py	Auto-update output	Extra dependency

Cross-Cutting Insights¶

The best systems make the documentation the test (Rust, Go, Elixir) rather than maintaining separate example files.
Hiding boilerplate is critical — Rust's # lines, Go's implicit main, Elixir's shared module context.
Output-based testing (// Output:, Cram) is the simplest assertion mechanism but limits what you can test.
The "accept actual output" workflow (Cram -i, Dune promote, pytest-examples --update-examples) dramatically reduces maintenance friction.
Java's @snippet (reference external tested files) is the cleanest for compiled languages where inline evaluation is impractical.

Recommendations for remote-store¶

Option A: Inline assertions (simplest, recommended now) Add assert statements to existing examples/*.py scripts. Zero dependencies, examples stay runnable standalone, immediate value.

Option B: Go-style output markers + pytest-examples Adopt #> markers, use --update-examples to auto-sync. Good if we want output validation.

Option C: Doctest in docstrings (longer-term) Small examples in API docstrings, run with pytest --doctest-modules. Closest to the Rust/Go/Elixir ideal for core methods. Keep standalone examples as tutorials with inline assertions (Option A).

Suggested path: A now, C later as the API stabilizes.