Documentation style guide¶
Authoritative conventions for docstrings, code comments, reference docs, and the changelog. PRs deviate from these rules only with an explicit one-paragraph rationale in the PR description.
Core principle¶
Code and reference docs describe the current state of the software. ADRs and the changelog describe its history.
A docstring on a function in src/ answers "what does this do
now?". A comment in .lychee.toml answers "why is this value set
this way?". Neither answers "how did we get here?" — that's the
job of docs/adr/ and CHANGELOG.md.
A docstring that says "added in PR #50", "previously did X", "calibrated against the v1.0.2 release cycle", or "see ADR 0038 for the three options considered" is leaking history into the present-tense surface. The reader needs to know the contract, not its provenance.
Docstrings¶
Form¶
- First line: imperative-mood single sentence ending in a period. ("Return the resolved caller email.", not "Returns the resolved caller email." and not "This function returns…")
- Blank line.
- Optional body: present-tense factual prose describing the
contract — inputs, outputs, edge cases, invariants. Use
paragraphs for distinct concerns; reST field lists (
:param x:,:returns:) only when generated tooling needs them. - No headings, no bulleted "sections" inside docstrings.
- No trailing changelog-style notes.
Allowed content¶
- What the function/class/module does, now.
- The contract's edge cases ("returns
Nonewhen the row count is zero", "raisesValidationErroron malformed input"). - Cross-references to current specifications: ADR numbers, BigQuery / DuckDB doc URLs, RFC numbers. The reference must be to a contract this code conforms to, not to a historical decision about this code.
Forbidden content¶
- Dates, version numbers, release names ("as of v1.0.2", "shipped May 2026").
- PR numbers, issue numbers, commit SHAs.
- Phase / wave / bucket labels (
P2.d,G4,Bucket A,slice-2). - "Previously X" / "we used to" / "this was added when" / "this closes".
- "TODO", "FIXME", "future work", "will eventually". If something is incomplete, either open a tracking issue and link it, or fix it now. A docstring is not a backlog.
- Author names ("originally written by X", "thanks to Y").
- Calibration narratives ("after observing 502s on PR #43, retries raised to 4 × 10s").
Examples¶
❌ Bad (history + planning + PR references):
def rewrite_session_user(bq_sql: str, caller: CallerIdentity) -> str:
"""Pre-translate BigQuery SQL for the ``SESSION_USER()`` function.
Added in PR #50 (ADR 0038) to close the canonical
RAP-via-SESSION_USER tenant-isolation pattern. Extended in
PR #62 to also handle ``CURRENT_USER()`` and ``@@session.user``
(ADR 0040). The Storage Read row_restriction path was previously
failing to thread the caller through; that limitation was
closed in the same PR.
Returns the input unchanged when no rewrite is needed.
Idempotent. TODO: handle ``SESSION_USER()`` inside SQL UDF
bodies in a future release.
"""
✅ Good (present-tense contract):
def rewrite_session_user(bq_sql: str, caller: CallerIdentity) -> str:
"""Substitute every BigQuery caller-identity call with a string literal.
Handles ``SESSION_USER()``, ``CURRENT_USER()``, and
``@@session.user``. Each call site folds to the email returned
by :func:`resolve_session_user` for ``caller``. The input is
returned unchanged when no recognised spelling is present and
when the SQL fails to parse. Idempotent: a second pass with
the same caller is a no-op because the first pass replaced
every matching node with a literal.
See [ADR 0038](../../docs/adr/0038-session-user.md) for the
resolution contract.
"""
Module-level docstrings¶
Same rules. The module docstring describes what the module is for right now — not its evolution. If a module collects routines from multiple phases of work, say what they do as a coherent surface, not when each was added.
Test docstrings¶
Identical rules apply. Test docstrings describe what the test verifies, not which PR introduced it or which CodeRabbit thread prompted it.
Code comments¶
When to write one¶
Only when the why is non-obvious from the code. Never the what — well-named identifiers cover that.
Allowed content¶
- A subtle invariant ("DuckDB returns this as a list even for single-row results; unwrap deliberately").
- A specific workaround for an upstream bug ("workaround for DuckDB issue #12345 — remove when the fix lands").
- A non-obvious performance choice with measurement.
Forbidden content¶
Same list as docstrings: dates, PRs, phases, "previously", "added in".
Example¶
❌ Bad:
# Bumped from 5 to 10 after the v1.1.0 cascade — Google Cloud
# docs CDN added a locale-prefix hop in May 2026 that ran us out
# of redirect budget. PR #63 raised the cap.
max_redirects = 10
✅ Good:
# Google Cloud docs URLs chain through CDN + locale-prefix
# redirects (typically 6–8 hops). Bumping beyond this risks
# masking infinite-loop CDNs.
max_redirects = 10
The good version answers "why this number" without timestamping the answer.
Reference docs¶
docs/reference/** and docs/architecture/** (except
docs/adr/** and docs/rfcs/**) describe how the software
behaves today. Apply the docstring rules: no history, no PR
references, no phase labels.
Auto-generated reference docs (conformance-coverage-matrix,
compatibility-matrix, sql-function-mapping, api-coverage)
are governed by their generator scripts; the same rules apply to
those generators.
ADRs¶
ADRs in docs/adr/** are exempt from these rules — historical
decision records are exactly what the rules above push out of the
code. An ADR's job is to record what was decided, why, and what
was considered. They are immutable once accepted; a later decision
that supersedes an earlier one ships as a new ADR that explicitly
supersedes it.
CHANGELOG¶
The changelog follows Keep a Changelog 1.1.0 + Common Changelog. Both are explicit specifications; deviations require justification.
Structure¶
# Changelog
## [X.Y.Z] - YYYY-MM-DD
### Changed
- Single-line imperative-mood entry, capitalised, period-terminated.
### Added
- One change per bullet.
### Removed
- Same.
### Fixed
- Same.
- Sections in the order Changed / Added / Removed / Fixed (Common Changelog ordering; breaking changes most important).
- Optional
### Deprecatedand### Securityper Keep a Changelog if the version actually has such entries. - One bullet per change. No sub-headings inside a version section.
- No paragraphs inside entries. If a change needs explanation beyond what fits in one line, that explanation belongs in an ADR or in the docstring of the code it describes — link to it, don't restate it.
Entry form¶
- Imperative mood. ("Add CURRENT_USER alias for SESSION_USER", not "Added CURRENT_USER alias" and not "This release adds…")
- Capitalised first letter, period-terminated.
- Reference link at the end is optional and discouraged. The CHANGELOG is read for what changed, not who did it.
Forbidden in CHANGELOG entries¶
- Fixture counts, PR numbers, file paths, ADR numbers, phase labels, recorder narratives, contributor names.
- Multi-paragraph essays.
- Per-entry sub-headings (
**Implementation:**,**Coverage:**,**Out of scope:**). - "This PR" / "this change" — entries are about what changed, not what work happened.
- TODOs / planned follow-up.
Examples¶
❌ Bad (current repo style):
- **`CURRENT_USER()` + `@@session.user` + Storage Read
`row_restriction` caller threading** (ADR 0040). Closes three
items deferred by ADR 0038's out-of-scope section in a single
follow-up:
1. **`CURRENT_USER()` function alias** — BigQuery documents…
2. **`@@session.user` system-variable spelling**…
3. **Storage Read `row_restriction` caller threading**…
Coverage:
- **8 new unit tests** in `tests/unit/sql/rewriter/…`
- **1 new integration test** in `tests/integration/…`
- **6 new e2e tests** (2 per client × Python / Node.js / Go / Java)…
✅ Good:
### Added
- Recognise `CURRENT_USER()` and `@@session.user` as aliases for `SESSION_USER()`.
- Thread the caller identity into Storage Read row-restriction filters.
The reader who wants more depth reads the corresponding ADR. The ADR is the durable place for "three options considered, here's why we picked C". The changelog is for "here's what changed".
Authoring cadence¶
CHANGELOG entries are authored at release time, not on every
PR. A PR's body documents what the PR did; the changelog
captures what the release shipped. At release time the operator
reads git log <prev-tag>..HEAD and synthesises one bullet per
user-visible change.
There is no ## [Unreleased] section between releases. The first
heading after # Changelog is always the most recent shipped
version.
Mutability¶
A released version's section is immutable except to fix an error (typo, wrong reference, factual mistake about what shipped). Mutability is not for adding context, expanding explanations, or back-filling work that was forgotten — that goes in a new release.
Migration path for existing content¶
This guide ships as the authoritative standard. Existing docstrings, comments, and the CHANGELOG were authored before it and will be brought into compliance through a series of focused PRs:
- A separate PR rewrites
CHANGELOG.mdend-to-end (every version section + the removal of theUnreleasedsection). - A separate PR sweeps every docstring under
src/bqemulator/**. - A separate PR sweeps
docs/reference/**anddocs/architecture/**(excluding ADRs and RFCs).
Until those PRs land, expect old code to violate this guide. New
PRs against main are expected to comply.