The conformance corpus¶
The conformance corpus is the emulator's promise that its SQL output
matches real BigQuery for the queries we care about. Every fixture in
tests/conformance/sql_corpus/ is a canonical query whose recorded
output from real BigQuery is diff'd against the emulator's output at
test time. This guide explains how to read it, add a fixture,
re-record after a change, and interpret divergences.
For the design rationale see ADR 0022. For the tier's place in the broader strategy see testing-strategy.md (Tier 5).
Quick start¶
Run the corpus against the in-process emulator (no credentials required):
The runner is fully offline — it re-uses each fixture's recorded
expected.json and only talks to the emulator. Re-recording is a
separate, credentialed action (see "Recording" below).
Adding a fixture¶
A fixture is a directory under tests/conformance/sql_corpus/<surface>/
with this shape:
tests/conformance/sql_corpus/
rest_crud/
select_where_eq/
query.sql # the SQL under test (required)
setup.sql # fixture seed (optional)
expected.json # baseline from real BigQuery (generated)
Pick the <surface> subdir that best matches the surface the fixture
exercises. The catalog is in
sql_corpus/README.md.
Authoring rules¶
${DATASET}is the only placeholder. Reference your seed tables as`${DATASET}.orders`(note the backticks — BigQuery needs them for fully-qualified names that include-in the project segment).- Use
CREATE OR REPLACE TABLEinsetup.sql. The recorder re-runs the setup on every--forcerecording; idempotent DDL keeps re-recording deterministic. - No time-dependent SQL.
CURRENT_TIMESTAMP(),CURRENT_DATE(),RAND(),SESSION_USER(),FOR SYSTEM_TIME AS OF <relative>are all excluded by ADR 0022 — their baselines would drift on every re-record. - No
INFORMATION_SCHEMAqueries. Those return dataset names that vary between recording (BigQuery) and runtime (the emulator) and have no clean canonicalisation. - Single output column types per row. Mixing strict-int and string-of-int values in the same column triggers BigQuery's type inference differently from the emulator's; pick one.
A typical literal-only fixture:
A fixture with a seed:
-- tests/conformance/sql_corpus/rest_crud/select_where_eq/setup.sql
CREATE OR REPLACE TABLE `${DATASET}.orders` (
order_id INT64,
customer STRING,
amount NUMERIC,
order_date DATE
);
INSERT INTO `${DATASET}.orders` (order_id, customer, amount, order_date) VALUES
(1, "Alice", NUMERIC "100.00", DATE "2024-01-15"),
(2, "Bob", NUMERIC "250.50", DATE "2024-01-15");
-- tests/conformance/sql_corpus/rest_crud/select_where_eq/query.sql
SELECT order_id, customer
FROM `${DATASET}.orders`
WHERE customer = 'Alice'
ORDER BY order_id
Adding an error fixture (P3.a / ADR 0022 §3)¶
An error fixture is a fixture whose query.sql is expected to
fail against real BigQuery. The runner expects the emulator to
raise a matching GoogleAPIError and diffs the four-field error
shape (reason / location / http_status /
message_pattern) against the recorded BigQuery baseline. The
file layout is identical to a success fixture; the recorder
detects the failure and writes an error envelope to
expected.json instead of schema + rows:
-- tests/conformance/sql_corpus/standard_functions/error_syntax_unclosed_paren/query.sql
SELECT (1 + 2 AS x
Naming convention: prefix the fixture directory with error_ so
ls error_* enumerates the error-shape fixtures at a glance.
Choose a thematically-appropriate surface subdir
(standard_functions/error_syntax_* for syntax errors,
rest_crud/error_table_not_found for resource-not-found,
routines_scripting/error_routine_not_found for routine references,
etc.). The naming + subdir placement is convention only — the
runner's branch is driven by the recorded error field.
If the fixture needs a dataset (e.g., for error_table_not_found
the dataset must exist before the query can reference a missing
table inside it), provide a minimal idempotent setup.sql:
-- tests/conformance/sql_corpus/rest_crud/error_table_not_found/setup.sql
-- The runner provisions the dataset; this no-op query forces
-- setup.sql to be non-empty (a comment-only file is rejected by
-- BigQuery's parser).
SELECT 1 AS placeholder;
Then record as normal (make record-conformance --filter
error_). The recorder will write a v2 expected.json with
an error envelope — the message_pattern is a regex with
the per-fixture dataset FQDN substituted to a wildcard and
line:column markers normalised to \[\d+:\d+\], so the same
pattern survives re-recordings against different BQ projects.
After recording, run make test-conformance to verify the
emulator's error matches BigQuery's. If a divergence surfaces,
either fix the emulator's error renderer (see
src/bqemulator/jobs/error_mapper.py
for the existing translator) or add an xfail(strict=True)
entry to divergences.py with an ADR-rooted rationale.
Recording (local-only)¶
Recording is a local action, never run from CI. The recorder is
the only path that produces expected.json — hand-editing the file
is a non-negotiable disqualifier; every payload must include the
BigQuery job_id of the job that produced it.
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
export BQEMU_CONFORMANCE_PROJECT=<bq-project-you-control>
make record-conformance
# …or invoke the script directly for finer control:
python scripts/record_conformance_fixtures.py \
--project "$BQEMU_CONFORMANCE_PROJECT" --location US
After recording, git diff the changed expected.json files,
confirm the changes reflect a real upstream BigQuery change (or your
fixture edits), commit, and open a PR. CI exercises the new
baselines on the next push.
Useful flags:
| Flag | Effect |
|---|---|
--filter <substring> |
Re-record only fixtures whose <surface>/<name> contains the substring. |
--force |
Overwrite an existing expected.json. Default is to skip. |
--dry-run |
Print the plan without running queries. |
--byte-cap <bytes> |
Refuse any fixture that scans more than this (default: 1 GiB). |
--verbose |
DEBUG-level logging — includes the first line of each setup statement. |
The recorder logs every BigQuery job id to stdout. Save the output of a recording session: if a baseline ever becomes suspect, the job id lets you replay the exact query in real BigQuery's job history.
Cost guard¶
The recorder enforces a per-fixture byte-scan cap (default 1 GiB)
using BigQuery's total_bytes_processed. A fixture that exceeds the
cap is logged and the corresponding expected.json is not
written, so a subsequent test run reports the fixture as
"unrecorded" rather than silently picking up a stale baseline. The
recorder exits non-zero whenever the cap is hit.
For the v1.0.0 corpus the per-fixture cost is in single-digit cents worst-case (each fixture seeds kilobytes of synthetic data).
Interpreting test output¶
The runner produces per-fixture pytest output:
- PASS — emulator output matches the recorded baseline within the tolerances documented in ADR 0022 §3.
- FAIL — a real divergence. The failure message names the
fixture id, the BigQuery
job_idof the recorded baseline, and the first three diffing cells (column path + expected vs actual). - XFAIL — the fixture is in
tests/conformance/divergences.py(e.g. spheroidal-vs-planar GEOGRAPHY at continental scales) and the divergence still holds — expected. - XPASS — the fixture is in the divergences dict but the emulator
and real BigQuery agreed. Because
strict=Trueis applied, this fails the test. The remediation is to delete the entry fromdivergences.py; the next run will re-grade as PASS.
Adding a known divergence¶
When a fixture intentionally diverges from real BigQuery, add it to
tests/conformance/divergences.py:
KNOWN_DIVERGENCES["specialized_types/st_distance_continental"] = (
"Spheroidal-vs-planar divergence — see ADR 0019 and "
"docs/reference/out-of-scope.md#spheroidal-geometry-on-geography"
)
Conventions:
- The rationale must reference an ADR or
docs/reference/out-of-scope.md. Invented divergences are forbidden. - When the divergence is later closed (a fix lands, or the upstream
changes), remove the entry. The runner's
strict=Truewill then flag any latent re-introduction as an XPASS, which fails the suite.
Re-recording after a real-BigQuery change¶
If BigQuery itself ships a behavioural change that breaks a baseline:
- Identify the affected fixtures from CI failure output.
python scripts/record_conformance_fixtures.py --filter <substring> --force.git difftheexpected.jsonchanges. Inspect them — confirm they reflect the real upstream change and not a recording error.- Commit the new baselines with a message referencing the upstream change.
- Push and let CI re-run.
If the upstream change permanently breaks the emulator's behaviour:
either add the fixture to divergences.py with an ADR-rooted
rationale or fix the emulator before merging.
CI integration¶
The conformance workflow runs weekly (Mondays at 07:23 UTC) and
on demand via the GitHub Actions UI. Configuration is in
.github/workflows/conformance.yml.
CI is offline: it installs the project, runs
pytest tests/conformance -m conformance against the in-process
emulator, and asserts the diff vs the recorded baselines is empty.
There is no GCP credential stored in repository secrets, the
recorder is not invoked, and no live BigQuery query is made. A
green CI run means the emulator still matches the recorded
baselines; a red run means the emulator either regressed or
real BigQuery's behaviour shifted in a way the maintainer needs to
triage.
Re-recording is a deliberate human action, done locally:
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/sa-key.json
python scripts/record_conformance_fixtures.py \
--project <bq-project> --location US --force
# review the diff, commit, open a PR
This keeps the GCP service-account key off GitHub, prevents an automated drift-fixer from silently rewriting baselines, and forces every baseline change through human review.
Auditing the corpus¶
Every expected.json carries the BigQuery job_id that produced it.
To audit that the corpus values are actually recorded against real
BigQuery (rather than hand-tuned):
# Confirm every expected.json has a non-empty job_id
find tests/conformance/sql_corpus -name expected.json -print0 \
| xargs -0 jq -e '.bigquery.job_id != null and .bigquery.job_id != ""' >/dev/null \
&& echo "All baselines carry a recorded job_id."
This invariant is enforced by a CI step that consults the recorded
job_id on every baseline.