ADR 0041: Cyclomatic-complexity ratchet to rank B + campaign retrospective¶
- Status: Accepted
Context¶
ADR 0035 introduced the non-blocking
quality gates with the threshold deliberately permissive
(--max-absolute E). ADR 0036
ratcheted the absolute ceiling to rank C (complexity ≤ 20) and
promoted the gate to required, refactoring 10 D/E-rank functions in
the same atomic PR.
ADR 0036 §"Alternatives considered" §3 explicitly evaluated tightening
to rank B (complexity ≤ 10) and rejected it as "too strict — many
structurally reasonable Python functions land at rank B-C." That
verdict was correct at that time: an immediate B-cap would have
trapped well-designed dispatch / state-machine functions and forced
either widespread --exclude carve-outs or premature abstraction.
The C threshold ran for ~6 months (PR #44 landing → today). During that window the rank-C bucket revealed itself as a different shape than ADR 0036 anticipated:
- Most rank-C functions weren't structurally reasonable — they were the next layer of "type-dispatch grew accumulated cruft" that ADR 0036's refactors hadn't reached.
- Behaviour-preserving dispatch-table + helper-extraction patterns (Buckets A/B from ADR 0036) collapsed every rank-C function the campaign attempted — same as the D/E sweep. The bucket-C "irreducible domain complexity" escape hatch stayed at 0 hits, exactly like ADR 0036's audit list.
- The thresholds at which the dispatch pattern stops being a clear improvement turned out to be lower than rank B's 10-complexity ceiling: the helpers produced by C→B refactors land at rank A (CC 1–5) almost universally; very few of the new helper functions sit in the rank-B band themselves.
In other words: the codebase converged on a structural style where rank-A is the common case and rank-B is the "edge of natural complexity" boundary. Rank-C functions look like outliers, not typical code. Tightening the ceiling to B aligns the gate with the observed structural floor of the codebase, not against a hypothetical "refactor everything" mandate.
Baseline audit (campaign sweep PR-1 through PR-11)¶
radon cc src/bqemulator -nC -s against main at the start of the
campaign returned ~60 rank-C functions across 26 files. The
campaign closed them in 11 sequential PRs, each scoped to a single
package or subsystem so review surface stayed bounded and the
behavioural impact of each refactor was independently reviewable.
| PR | Commit | Scope | Files | C-blocks closed |
|---|---|---|---|---|
| 1 | fb46d9e (#90) |
storage type / value mappers | storage/arrow_bridge.py, storage/type_map.py |
4 |
| 2 | 045493a (#91) |
REST DTO mappers | api/routes/{datasets,routines,tables}.py |
3 |
| 3 | f1a9109 (#92) |
misc leaf blocks | 3 files | 3 |
| 4 | d7baaba (#93) |
REST request handlers | api/routes/jobs.py etc. |
5 |
| 5 | cdf5de0 (#94) |
catalog + versioning | catalog/*, versioning/* |
5 |
| 6 | 21000c3 (#95) |
builtin UDFs | sql/builtin_udfs.py |
6 |
| 7 | a17ce60 (#96) |
SQL rewriters | sql/rewriter/*.py (8 files) |
9 |
| 8 | 1fc1e14 (#98) |
SQL core | sql/{table_rewriter,parameters,translator}.py |
4 |
| 9 | f72314f (#99) |
gRPC + streaming | grpc_api/*, streaming/read_session.py |
4 |
| 10 | bf62f8e (#100) |
scripting | scripting/{lexer,parser,interpreter}.py |
8 |
| 11 | 2a87499 (#102) |
jobs | jobs/{avro_reader,upload_session_manager,error_mapper,executor}.py |
9 |
xenon --max-absolute B --max-modules C --max-average A src/bqemulator
passes against main after PR-11 landed. radon cc src/ -s -n C is
empty. Project-wide CC average sits at A (3.06) — improved from
the ~3.4 ADR 0036 reported, because the new helpers introduced by
the dispatch-table pattern overwhelmingly land at rank A (CC 1–5).
Decision¶
- Tighten the absolute ceiling to rank B in the Makefile gate:
--max-absolute B— no function above rank B (CC > 10). A new C-rank function fails CI loudly, exactly the way a new D-rank function did under the old gate.--max-modules C— stays at C. Module averages are aggregate; some modules carry justified rank-B clusters (e.g. SQL builtin-UDF rule classes) where the per-function pattern is correct but the average drifts slightly above A. Tightening this further would either force premature abstraction or push the rank-B helpers below their natural readable size.-
--max-average A— stays at A. Project-wide CC average is 3.06 (well below the 5.0 rank-A ceiling). -
No additional refactors required. The PR-1 through PR-11 campaign closed every existing C-rank function. The B-ceiling flip is a single-line config change against an already-compliant codebase. This is the same atomic "refactor first, then ratchet" discipline ADR 0036 established — the only difference is that the refactor work was spread across 11 PRs rather than one.
-
No bucket-C exclusions added. The 60+ rank-C functions closed during PR-1 through PR-11 were all bucket-A (clean refactor) or bucket-B (dispatch table). The bucket-C escape hatch from ADR 0036 stays available for any future genuine domain-shaped complexity, but the empirical rate after 70+ refactors is 0%.
-
ADR 0036 remains in effect. The C-ratchet ADR is not superseded — its decisions about the gate's promotion-to-required status, the bucket A/B/C taxonomy, and the duplication / dead-code gates' non-blocking posture all stand. This ADR layers a tighter absolute ceiling on top.
Refactor results (campaign-wide)¶
Per-PR before / after counts captured live during the work:
| PR | Functions touched | Worst rank before | Worst rank after |
|---|---|---|---|
| 1 | 4 | C (17) | A (4) |
| 2 | 3 | C (15) | A (4) |
| 3 | 3 | C (16) | A (5) |
| 4 | 5 | C (18) | A (5) |
| 5 | 5 | C (15) | A (4) |
| 6 | 6 | C (15) | A (5) |
| 7 | 9 | C (19) | A (5) |
| 8 | 4 | C (14) | A (4) |
| 9 | 4 | C (17) | A (5) |
| 10 | 8 | C (16) | A (6) |
| 11 | 9 | C (20) | A (4) |
Notable peaks (highest pre-refactor complexity in the campaign) all hit at rank C(20)—the upper edge of the rank-C band:
jobs/error_mapper.translate_runtime_errorC(20) → A(4) via_DUCKDB_TRANSLATORSdispatch tuple (PR-11)jobs/executor.execute_load_jobC(20) → ≤B via_LoadJobConfigdataclass + per-format dispatch (PR-11)jobs/executor._arrow_type_to_bq_typeC(19) → A(3) via_ARROW_TO_BQ_RULESdispatch tuple (PR-11)sql/rewriter/...C(19) functions → A/low-B via per-rule helpers (PR-7)
The new helper functions (one per extracted sub-block; one per dispatch-table handler) all land at rank A or low B — the same shape ADR 0036 observed at the D→C ratchet, repeated at scale.
Rationale¶
Why rank B, not stay at C¶
Three converging signals:
-
The C-ratchet revealed bucket-A/B exhaust at zero. ADR 0036 anticipated up to ~20% of the rank-D/E audit list might be irreducibly complex; the observed rate was 0%. PR-1 through PR-11 tested the same hypothesis at the C-band — another ~60 functions refactored with the same bucket-A/B patterns, same 0% irreducible rate. The codebase doesn't carry genuine domain-shaped complexity in the C-band that resists dispatch-table refactoring.
-
The post-refactor codebase converged on rank A. New helpers land at A (CC 1-5) almost universally. The natural structural floor of the post-campaign codebase is "rank A is common, rank B is rare." Setting the gate to rank-B caps the worst case at that observed structural ceiling.
-
Drift detection wins. A new function that lands at rank C under the old gate would silently pass; the same function fails loudly under the new gate. Given the campaign closed every existing rank-C function, "no rank-C functions" is a stable steady-state the gate can defend.
ADR 0036's original rejection of rank B was correct for its moment: without the campaign, the rank-C bucket carried justifiable "structurally reasonable Python" complexity. After the campaign, the remaining rank-C functions were the absence — there are none.
Why a multi-PR sweep, not one big bang¶
The C-band had ~60 functions to close, vs. the 10 D/E-rank functions ADR 0036 closed in a single PR. Two reasons not to repeat the big-bang shape:
-
Review surface. ADR 0036's PR landed +475/-316 lines across 6 source files. A C-band big-bang would have been +3000-5000 across 26 files, which exceeds the size at which a reviewer can reliably verify behaviour preservation per-hunk.
-
Subsystem coherence. The C-band functions clustered by subsystem (storage types, REST mappers, SQL rewriters, scripting, jobs). Per-subsystem PRs let each one ship with a focused validation gate (e.g. PR-9's full e2e-against-Storage-API matrix; PR-10's full scripting conformance run) without the distraction of unrelated refactor noise.
The trade-off: longer end-to-end timeline (~3 months wall-clock for PR-1 through PR-11), more merge-coordination overhead, and a window where main has the gate at C but is converging on B-readiness. Acceptable trade-off given the review-surface alternative.
Why we keep --max-modules at C¶
Module averages aren't the same metric as per-function ceilings. A
module with 100 rank-A functions and 1 rank-B function averages
~A, but a module with 20 rank-A functions and 5 rank-B functions
might average mid-B. Tightening --max-modules to B would surface
the latter without distinguishing "5 reasonable rank-B helpers" from
"5 ratty rank-B functions due for a rewrite."
The aggregate is a different concern from the per-function ceiling.
Today's project-wide module averages are all ≤ rank C; some modules
sit at low-B because their natural shape carries clusters of
"clean small dispatch functions" each at rank A-B. Keeping
--max-modules C lets that cluster exist; the per-function gate
catches any one of them growing too large.
Why no bucket-C exclusions emerged¶
Two structural reasons:
-
The dispatch pattern is general. Both bucket-A (helper extraction) and bucket-B (dispatch table) are language-neutral refactoring patterns that work for almost any "long if/elif chain" or "long procedural function." The codebase's complexity is overwhelmingly of those two shapes — type dispatch, REST field mapping, SQL rewriter rule application, AST traversal — which all dispatch cleanly.
-
Genuine domain complexity hides in systems, not functions. The places where the emulator carries genuinely irreducible complexity (the SQL translator pipeline, the scripting interpreter's control-flow handling, the conformance runner's comparator) are multi-function systems with rank-A or low-B individual functions. Cyclomatic complexity per-function is the wrong lens for those — they're complex by design at the interaction level, not the per-function-control-flow level.
The bucket-C exclusion escape hatch stays available — a future contributor might find a genuine SQL grammar parser node where a 20-way switch is the cleanest model. The bar is unchanged from ADR 0036: "another maintainer reading the function would agree the complexity is domain-shaped, not accumulated."
Consequences¶
Positive¶
- The codebase's worst function complexity drops from rank C (CC 20) to rank B (CC ≤ 10). The dispatch-table + helper-extraction patterns are now consistently applied across the codebase's branchy hot spots — type mappers, AST traversal, error mapping, per-format job loaders, BQ wire-format translators.
- New contributors writing a rank-C function get the same loud CI failure that ADR 0036 introduced for rank-D — the gate's per-PR signal scales with the actual codebase quality bar, not a historical baseline.
- The campaign documented the 11-PR ratchet pattern as a reusable
approach. Future ratchets (e.g. tightening
--max-modules, promoting jscpd to required, tightening vulture's confidence threshold) can follow the same per-subsystem-PR shape with predictable review-surface bounds. - Project-wide CC average improved from ~3.4 (post-ADR 0036) to 3.06. The dispatch-pattern helpers don't just preserve average complexity — they slightly improve it, because the per-function branch count is now distributed across more (simpler) functions.
Negative¶
- The gate is now strict enough that some natural-shape "small state-machine functions" might trip it. If a future contributor hits this with a genuine domain-shaped function, they have two paths: refactor (the campaign's experience says this works ~100% of the time for the patterns the codebase already carries), or open a bucket-C exclusion PR with an additive ADR section. The friction is bounded; the alternative — letting rank-C functions accumulate again — would unwind the campaign.
- The
make verifychain's complexity step continues to enforce the new threshold locally. No new local cost (xenon's runtime is dominated by I/O, not the threshold value), but a new contributor unfamiliar with the campaign might land their first rank-C function and need to read this ADR + ADR 0036 to understand the pattern. The Makefile's docstring + the dispatch-table examples in the codebase (_DUCKDB_TRANSLATORS,_ARROW_TO_BQ_RULES,_LOAD_FORMAT_HANDLERS,_STATEMENT_DISPATCH, etc.) document the expected shape.
Neutral¶
- The refactor + ratchet split (PR-1 through PR-11 closed the C-band; PR-12 flips the gate) means main went through a window where it was "rank-C-clean but gate-still-at-C." That window closes with this PR. No production effect — the emulator's runtime behaviour is identical pre- and post-campaign.
- Helper-function count grew by ~150-200 across the 11 PRs. Module
sizes drifted up by a similar margin; some files (notably
jobs/executor.pyandscripting/interpreter.py) gained ~30-50% LOC. Maintainability index per-module stayed rank-A or rank-B throughout — the additional helpers compensate for their inline-branch counterparts. - CHANGELOG entry deferred to release time per the
project's release-time-authored CHANGELOG policy.
This ADR is the authoritative record of the campaign until the
next release synthesises a CHANGELOG bullet from
git log.
Alternatives considered¶
-
Stay at rank C indefinitely. Would have left the gate defending a baseline that the codebase had outgrown. Post-campaign, the natural structural floor is rank A — a gate at rank C carries no defensive value once the rank-C band is empty.
-
Skip ADR 0041, fold the gate flip into PR-11. Considered; rejected. The doc retrospective + Makefile flip is a separate logical change from "refactor jobs/* functions." Keeping them distinct preserves the bisect signal — if a future regression surfaces against the B-gate,
git log --grep "ratchet to rank B"points directly to this PR. -
Ratchet to rank A (CC ≤ 5). Too strict. The rank-A ceiling would trip on natural state-machine functions (
_run_query_fast_paths,_classify_parsed_tree,_avro_dict_to_arrow) whose internal complexity is well-formed "small switch over a closed enum." Forcing those below rank-A would push the dispatch one layer further, producing strictly harder-to-read code without a complexity win at the call site. -
Use cognitive complexity instead. SonarQube's cognitive complexity metric is more nuanced than cyclomatic for the "deep nesting vs. flat dispatch" axis. xenon doesn't support cognitive complexity; radon supports Halstead and maintainability index but not cognitive. Adopting cognitive complexity would require switching tools (SonarCloud was already declined per ADR 0035; pylint has it but isn't in the lint chain). The marginal value is real but the operational cost of adoption exceeds it.
-
Ratchet
--max-modulesto B in the same PR. Rejected per the rationale above. Module averages are aggregate; the right ratchet for them is a separate audit + per-module PR sequence if/when justified, not a side-effect of the per-function ratchet.
References¶
- ADR 0035 — the foundational decision introducing the gates; this ADR is the third in the ratchet sequence (0035 → 0036 → 0041).
- ADR 0036 — the C-ratchet ADR; remains in effect for the gate's promotion-to-required status and the bucket A/B/C taxonomy this ADR layers on top of. See ADR 0036's dated update at the bottom for the cross-reference.
- radon documentation — rank-tier definitions (A: 1-5, B: 6-10, C: 11-20, D: 21-30, E: 31-40, F: ≥41).
- xenon documentation —
CLI threshold semantics (
--max-absolute,--max-modules,--max-average). - Campaign PRs: #90 through #102 (excluding #97 lychee-retry hotfix and #101 lychee-max-redirects hotfix, which are unrelated).
AGENTS.md"Pre-PR gate (mandatory)" — the contract this ADR extends;make verifyrunsmake quality-complexityagainst the new ceiling.
Update (2026-05-28): follow-up --max-modules ratchet in ADR 0042¶
This ADR's "Why we keep --max-modules at C" section held the
per-module-average ratchet open as "the right ratchet for them is a
separate audit + per-module PR sequence if/when justified, not a
side-effect of the per-function ratchet."
A focused audit immediately following this PR's merge tested that
assumption empirically: the per-function C→B campaign drove module
averages down to ≤B as a side effect. xenon --max-modules B
src/bqemulator exited 0 against main at HEAD 247fb60 (this PR's
own merge commit) with zero refactor work required. The audit found:
- 0 modules with CC average above rank B (CC > 10).
- 14 modules in the rank-B band (CC avg 6.01–10.00) — all passing the rank-B threshold; would become the audit list for a future B→A ratchet (out of scope).
- 120+ modules at rank A (CC avg ≤ 5.0).
The mechanism: splitting branchy functions into dispatch tables + helpers redistributed complexity weight across more (smaller, lower-rank) functions, which lowered the per-module average even where function count grew. The campaign's per-function bucket-A/B patterns are also per-module-average favourable patterns.
ADR 0042 is the single-PR
config flip (xenon --max-modules C → B) that lands the
already-met threshold as the standing rule. ADR 0041 is not
superseded — it remains the per-function ratchet ADR; ADR 0042
documents the per-module-average ratchet as a complementary axis.
The "Why we keep --max-modules at C" section above is the
historical record of the held question; ADR 0042 is the resolution.