Conformance tier (Tier 5)¶
This document maps the conformance tier's modules to the contracts they enforce. For the design rationale see ADR 0022; for the operator-facing guide see conformance-corpus.md.
What the tier does¶
For every query in tests/conformance/sql_corpus/:
- Record (one-time, manual) —
scripts/record_conformance_fixtures.pyruns the query against real BigQuery, captures the output rows and schema, and writes them toexpected.jsonalongside the BigQuery job id, the bytes-processed count, and the wall-clock duration. - Replay (CI, per-PR + ad-hoc) —
tests/conformance/test_corpus.pyparametrises one pytest test per fixture, runs the query against the in-process emulator, and diffs the result against the recorded baseline with type-aware tolerance. The replay is offline (no BigQuery credentials required); the per-PR gate is wired into.github/workflows/ci.yml. The standalone.github/workflows/conformance.ymlworkflow exists for ad-hocworkflow_dispatchinvocation against a specific branch (typical use case: verifying a recording-session PR before opening it).
The corpus currently ships 1141 active SQL fixtures + 48 HTTP +
26 gRPC = 1215 active fixtures (plus 18 INFORMATION_SCHEMA
stubs awaiting operator-side recording). 13 fixtures are pinned
XFAIL in tests/conformance/divergences.py as permanent
design-decision divergences. The pass-rate gate is 100%
non-divergent — every fixture must either pass or be in the
divergences registry. ADR 0022 §5 documents the gate in detail.
Module map¶
tests/conformance/
__init__.py # tier-level docstring + rule summary
conftest.py # shared fixtures, marker enforcement, seed
_corpus.py # fixture discovery + ${DATASET} substitution
_comparison.py # type-aware row + schema diff
_row_encoding.py # rows ⇆ JSON (shared with the recorder)
divergences.py # explicit xfail registry
test_corpus.py # parametrised runner
sql_corpus/
README.md
<surface>/
<fixture>/
query.sql
setup.sql
expected.json
scripts/
record_conformance_fixtures.py
| Module | Responsibility |
|---|---|
_corpus.py |
Walks sql_corpus/, builds an immutable Fixture per directory, exposes substitute_dataset(sql, dataset) and split_statements(script) so both the recorder and the runner produce the same SQL stream. |
_comparison.py |
The compare_results(expected, actual_rows, actual_schema) -> CompareReport entry point. Dispatches per BigQuery type to the rule documented in ADR 0022 §3 (exact for ints, ULP-scale for floats, ±1 µs for datetimes, etc.). |
_row_encoding.py |
Encodes google-cloud-bigquery's Python objects into the recorder's JSON shape. Shared by the runner so the actual emulator output is normalised the same way before diff. |
divergences.py |
Constants-only module: KNOWN_DIVERGENCES: dict[str, str] mapping fixture id → ADR-rooted rationale. Adding an entry pins the fixture to xfail(strict=True); removing one un-pins it. |
test_corpus.py |
Discovers fixtures once at import time and parametrises one pytest test per fixture. For fixtures with a setup.sql, creates a per-test dataset on the emulator, executes setup, runs the query, drops the dataset. Literal-only fixtures skip dataset creation. |
conftest.py |
Auto-applies the conformance marker; seeds the RNG (the conformance tier does not generate random inputs but the seed is logged in the session header for parity with the chaos tier). Exposes the conformance_dataset helper fixture for ad-hoc tests not parametrised by the corpus. |
scripts/record_conformance_fixtures.py |
The recorder. Walks the same corpus, executes against a real BigQuery project, writes expected.json. Enforces the byte-scan cap, refuses to overwrite without --force, logs every (fixture, job_id, bytes, ms) tuple. |
Fixture data shape¶
Two shapes exist; the runner branches on the optional error
field's presence (see ADR 0022 §3 Error parity).
expected.json (version 1 — success / row fixtures):
{
"fixture_version": 1,
"bigquery": {
"project": "<recording project>",
"job_id": "<BigQuery job id>",
"location": "US",
"total_bytes_processed": 1024,
"total_bytes_billed": 10485760,
"duration_ms": 1385
},
"schema": [
{"name": "n", "type": "INT64", "mode": "NULLABLE"}
],
"rows": [
{"n": 42}
],
"row_count": 1,
"duration_class": "fast"
}
expected.json (version 2 — error fixtures):
{
"fixture_version": 2,
"bigquery": {
"project": "<recording project>",
"job_id": null,
"location": "US",
"total_bytes_processed": 0,
"total_bytes_billed": 0,
"duration_ms": 525
},
"error": {
"reason": "notFound",
"location": null,
"http_status": 404,
"message_pattern": "Not\\ found:\\ Table\\ [\\w\\-\\.:]+\\.t_does_not_exist_xyz\\ was\\ not\\ found\\ in\\ location\\ US",
"message_sample": "Not found: Table myproj:ds.t_does_not_exist_xyz was not found in location US"
},
"duration_class": "fast"
}
The v2 shape adds an optional error envelope alongside (and
exclusive of) schema + rows. The runner branches on the
field's presence:
- No
errorfield ⇒ success-expected (the v1 contract; runner diffs rows + schema usingcompare_results). errorfield present ⇒ error-expected (runner expects the emulator to raiseGoogleAPIErrorand diffs viacompare_error:reason/location/http_statusexact match;messageregex-matched againstmessage_patternviare.searchwith DOTALL).
The 644 pre-existing v1 fixtures stay backward-compatible because
they lack an error field. Only newly-recorded fixtures get
fixture_version 2.
The bigquery.* block is metadata only — the runner reads it solely
for diagnostic messages. The comparison is driven by schema and
rows. The recorder always writes both; hand-editing either is a
non-negotiable disqualifier.
Comparison tolerance summary¶
Locked by ADR 0022 §3:
| Type | Rule |
|---|---|
INT64, BOOL, STRING, BYTES, DATE |
Exact equality |
NUMERIC, BIGNUMERIC |
Decimal equality |
FLOAT64 |
math.isclose(rel_tol=1e-12, abs_tol=1e-15) |
TIMESTAMP, DATETIME, TIME |
abs(a - b) ≤ 1 µs |
GEOGRAPHY |
WKT after whitespace + case normalisation |
ARRAY |
Length + ordered element-wise (recursive) |
STRUCT |
Per-field (recursive) |
RANGE |
Equality on {"start", "end"} shape |
INTERVAL |
Canonical YEAR TO SECOND string |
JSON |
Parsed json.loads equality |
Schema-level: case-normalised type aliases (INTEGER ↔ INT64, etc.)
and an absent mode is treated as NULLABLE.
Divergence policy¶
Three rules from ADR 0022 §4:
- Every divergence has a rationale string in
tests/conformance/divergences.pythat references an ADR ordocs/reference/out-of-scope.md. - The marker is
xfail(strict=True)so closing a divergence (the emulator catches up) shows up as a test failure on the next run. - The fixture stays in the corpus with its recorded baseline; the divergence is recorded but the SQL is still exercised against the emulator.
The initial divergence set (spheroidal-vs-planar GEOGRAPHY at
continental scales) is anchored in
ADR 0019 and
out-of-scope.md.
Runner / recorder symmetry¶
The recorder and the runner share three modules:
_corpus.py— same discovery + same substitution._row_encoding.py— same JSON serialisation for rows and schemas.- (Implicitly)
_comparison.py— the recorder writes the same shape the runner expects to receive, so the diff path is symmetric.
This is load-bearing: any divergence between the two would produce spurious test failures unrelated to actual emulator behaviour. The shared modules eliminate that risk.
CI integration¶
The conformance corpus is the project's primary parity-with-BigQuery
guarantee and runs on every PR + push to main via the
conformance job in
.github/workflows/ci.yml.
That per-PR gate is the release-blocker check — a regression there
must be resolved before merge. The standalone
.github/workflows/conformance.yml
workflow is workflow_dispatch only and exists for ad-hoc operator
invocation against a specific branch (typical use case: verifying a
recording-session PR before opening it via
gh workflow run conformance --ref <branch>).
Both invocations are offline: neither calls the recorder nor real
BigQuery. There is no GCP service-account key stored in repository
secrets; the workflows simply install the project and run
pytest tests/conformance -m conformance against the in-process
emulator with the committed expected.json baselines.
A failing run is an alert that either:
- The emulator regressed (most common — fix and re-merge).
- BigQuery itself shipped a change (rare — re-record locally and review).
- A divergence was closed (an xfail'd fixture now passes — remove
the entry from
divergences.py).
Re-recording is a deliberate local action. The maintainer runs
scripts/record_conformance_fixtures.py on their workstation
against a real BigQuery project they control, reviews the diff in
their editor, and commits the changed expected.json files in a
normal PR. Keeping recording out of CI removes the GCP-key
attack surface and prevents an automated drift-fixer from silently
overwriting baselines.