Out of scope¶
This project follows a no-deferral principle: when a feature is in scope, it ships complete in its phase. Scope boundaries are explicit — recorded here with rationale — and never presented as "coming in v1.1". Anything on this page can be reconsidered for v2 as a separate product decision, typically via an RFC.
Each excluded feature raises a clear UnsupportedFeatureError when
encountered, with a link back to this page.
Excluded in v1.0.0¶
BigQuery ML¶
CREATE MODEL, ML.PREDICT, ML.EVALUATE, ML.FORECAST, ML.GENERATE_*,
and all ML-related model types (ARIMA, k-means, matrix factorization, DNN,
boosted trees, AutoML).
Only Models resource CRUD — list / get / insert / patch / update / delete of model metadata — is supported.
Rationale: full BQML would be a project of comparable size to the rest of the emulator. See ADR 0012.
BI Engine¶
BigQuery's in-memory acceleration tier.
Rationale: a performance optimization with no observable semantic effect. Irrelevant for a local emulator that runs in a single process.
Reservations, assignments, capacity commitments¶
BigQuery's slot billing model.
Rationale: billing-plane concepts; emulator has no billing.
Slot and byte-billing simulation¶
Query cost accounting.
Rationale: no local analog and no cost model. dryRun requests
parse + validate but statistics.query.totalBytesProcessed is
always returned as "0" (the emulator has no way to estimate
bytes scanned — DuckDB's row-storage layout doesn't map onto
BigQuery's columnar pricing). Treat totalBytesProcessed as
"validation passed", not as a cost estimate.
Data Transfer Service¶
Scheduled transfers from external sources (Google Ads, Campaign Manager, S3, etc.).
Rationale: dozens of connectors, each a separate integration. Different product scope.
Scheduled queries¶
Cron-managed query runs in the bigquery-data-transfer.googleapis.com surface.
Rationale: scheduling plane, not SQL semantics. Use a local cron / CI scheduler to run queries against the emulator.
Cloud Logging / Cloud Monitoring integration¶
Export of emulator activity to Google-hosted observability.
Rationale: emulator exposes its own Prometheus and OpenTelemetry instrumentation; integration with Google's logging stack is not useful locally.
Cross-region replication¶
Dual-region or multi-region dataset replication semantics.
Rationale: no geographic model in the emulator.
IAM enforcement¶
IAM policies on datasets and tables are stored and returned from the REST API (so client code that round-trips them works), but they are not enforced — the emulator accepts any credentials.
Row access policies, by contrast, ARE enforced — queries are rewritten to apply the policy's filter.
Rationale: the emulator is an integration-test target, not an authorization gateway. Enforcing IAM would require a real identity provider.
Conformance fixtures pinned to this section:
- row_access/caller_information_schema_visibility
(real BigQuery returns 404 NotFound on
INFORMATION_SCHEMA.ROW_ACCESS_POLICIES queries when the caller
lacks bigquery.rowAccessPolicies.list; the emulator surfaces
the policy row because IAM is not enforced)
Durable Storage Write API stream state¶
BigQueryWrite streams are kept in memory only — a process restart
drops every in-progress stream, including any buffered rows in PENDING
and BUFFERED streams that were never flushed. Clients that restart
mid-write must use retry-with-offset on COMMITTED streams (the
emulator correctly returns ALREADY_EXISTS for already-ingested pages
within a single process lifetime).
Rationale: the emulator is ephemeral-by-default (EPHEMERAL
persistence mode is the recommended CI configuration). Adding durable
stream state would require a persistent WAL layer that duplicates
DuckDB's storage engine without matching BigQuery's production
semantics precisely. See ADR 0013.
Durable upload session state¶
Resumable upload sessions opened via the upload host
(/upload/bigquery/v2/projects/{p}/jobs?uploadType=resumable) are
kept in memory only. A process restart drops every in-progress
session, including the partially uploaded bytes staged on disk under
Settings.upload_staging_dir. Clients that restart mid-upload must
restart the upload from offset 0.
Rationale: same ephemeral-by-default charter as the Storage Write
API. The session map's lifetime is bound to the emulator's process
lifetime; adding cross-restart persistence would require a side
journal (session id → staging path + received-bytes counter) that
duplicates state already held in the staging file's size on disk
without matching BigQuery's production semantics precisely. The
emulator default of upload_session_ttl_seconds=3600 is operator-
tunable (1 minute to 24 hours); see
ADR 0029.
Storage Write API schema evolution¶
Real BigQuery's AppendRowsResponse.updated_schema propagates a
table-schema change back to writers when the table is altered
mid-stream. The emulator does not emit this field; writers are expected
to treat the schema supplied in writer_schema as authoritative for
the duration of the connection.
Rationale: the emulator does not yet support ALTER TABLE on active tables across concurrent writers.
Storage Write API trace propagation¶
AppendRowsRequest.trace_id and
AppendRowsRequest.missing_value_interpretations are ignored. Values
supplied by the client are not stored or returned.
Rationale: diagnostic-only fields in BigQuery; they have no effect on row persistence. Revisit if the community asks for trace-id pass-through into the emulator's OpenTelemetry spans.
Online backup of a running emulator¶
bqemulator backup and bqemulator restore require the emulator to
be stopped. A running emulator holds an exclusive
DuckDB file lock; both commands open the file directly via
duckdb.connect and would deadlock against a live server. Real
BigQuery's implicit always-online backup has no local analog.
Rationale: a hot-backup endpoint would add a write surface on the diagnostic admin router (we kept it read-only on purpose; see ADR 0020) or require WAL-aware filesystem integration. Both are larger in scope than the integration-test charter v1.0.0 sets.
Workaround: run the emulator under a copy-on-write filesystem
(btrfs / ZFS / Docker volume snapshot) and snapshot the underlying
volume while the emulator runs. The snapshot can be restored into a
fresh data_dir and started with bqemulator start --data-dir <snap>.
PersistenceMode.IMPORT enum value¶
The enum value bqemulator.config.PersistenceMode.IMPORT exists but
no code path reads it. ADR 0020 retired the original "live schema sync
against a real BigQuery project" design in favour of the one-shot
bqemulator import --from-project=… CLI command.
Rationale: a live-sync persistence mode would double the credential surface (the server would need ADC) and create an ongoing dependency on the real BigQuery REST API that's incompatible with offline test environments — exactly the use case the emulator exists to serve.
Workaround: run bqemulator import once to materialise schemas,
then start the server normally (persistence_mode=PERSISTENT).
The enum value is kept to preserve backwards compatibility for any caller that hard-coded it. A future v2 deprecation cycle may remove it; until then, it has no behavioural effect.
BIGNUMERIC literals with 39 integer digits¶
BigQuery's BIGNUMERIC type holds 38-digit integer-part precision plus
38-digit fractional-part precision (i.e. up to 77 total digits). DuckDB's
widest DECIMAL is DECIMAL(38, s) — total digit count capped at 38.
Contract:
- Literals where
integer_digits ≤ 38: ✅ Accepted. The pre-translator's Path C (seebqemulator.sql.rewriter.numeric_literals) truncates the fractional part to38 - integer_digitsdigits when the combined integer + fractional count exceeds 38. Examples: BIGNUMERIC '1.234567890123456789012345678901234567890'(1 int + 39 frac) → stored asDECIMAL(38, 37)with the last fractional digit dropped.BIGNUMERIC '12345678901234567890123456789012345678.123456789'(38 int + 9 frac) → stored asDECIMAL(38, 0)with the entire fractional dropped. Wire-format schema renderer surfaces this column as NUMERIC (scale ≤ 9) rather than BIGNUMERIC — see the "documented corner case" note below.- Literals where
integer_digits ≥ 39: ❌ Rejected. The literal falls through tobqemu_to_bignumericwhich raises anInvalidOperation/Conversion Error. The canonical example is BigQuery's BIGNUMERIC max value (5.7896…e38with 39 integer digits + 38 fractional digits) — pinned bystandard_functions/bound_bignumeric_max.
Documented corner case: when Path C truncation drops the
fractional scale to ≤ 9, the schema renderer's "scale > 9 →
BIGNUMERIC" inference falls back to NUMERIC. A bare
SELECT BIGNUMERIC '12345678901234567890123456789012345678.123'
(38 int + 3 frac, total 41) lands on the wire as NUMERIC. This only
affects naked SELECT BIGNUMERIC '…' AS col queries — BIGNUMERIC
literals bound into BIGNUMERIC-typed columns retain their column
type because the schema is determined by the column definition,
not the literal's scale.
Rationale (XFAIL'd fixture only): matching BigQuery's full
BIGNUMERIC range for the > 38 integer-digit case requires either
bundling a wide-decimal library (e.g. Python's
decimal.Decimal is unlimited, but routing it through DuckDB's
storage means storing BIGNUMERIC columns as VARCHAR and rewriting
every arithmetic / comparison / aggregation through a Python
helper UDF — multi-week scope) or replacing DuckDB as the storage
engine. Both are scope expansions far beyond what the conformance
corpus's single fixture warrants.
Workaround: for the bound_bignumeric_max case specifically,
stay within DuckDB's DECIMAL(38, 0) integer range — 38
integer digits (≈ 1e38) is more than sufficient for every
practical financial / scientific / cryptographic use case the
emulator targets. The fixture stays XFAILed against this section.
Conformance fixtures pinned to this section:
- standard_functions/bound_bignumeric_max (39 integer digits —
XFAILed against DuckDB's 38-digit DECIMAL cap).
Spheroidal geometry on GEOGRAPHY¶
BigQuery's GEOGRAPHY uses a spherical geometry model — it
uses S2's documented kEarthRadiusMeters = 6371010.0. The
emulator ships 5 translator rules in
bqemulator.sql.rules.spatial
(StDistanceSpheroidalRule / StLengthSpheroidalRule /
StAreaSpheroidalRule / StPerimeterSpheroidalRule /
StDWithinSpheroidalRule) plus 4 Python helper UDFs
(bqemu_st_{distance,length,area,perimeter}_spheroidal) that route
the metric-returning SQL surfaces through 3D-unit-vector great-circle
math (atan2 / cross / dot) and L'Huilier-fan spherical excess on the
S2 sphere. All 4 continental fixtures (st_distance_continental /
st_area_continental / st_length_continental /
st_perimeter_continental), all 12 metric spheroidal fixtures
(6 distance + 1 high-latitude + 3 area + 2 length), and the
small-scale st_dwithin_no predicate match BigQuery's recording
to within rel_tol=1e-12. The remaining spheroidal-bucket
fixtures below describe surfaces the spherical helpers do NOT
yet cover:
- ST_BUFFER (4 fixtures: 3 buffer +
st_buffer_continental) — generating BigQuery's exact 33-vertex geodesic-circle polygon needs a per-vertex bearing generator that emits the same azimuth / step / radius coordinates as BigQuery's internal algorithm. - ST_AsBinary (1 fixture:
st_asbinary_point) — BigQuery encodesST_GEOGPOINT(1, 1)via an ECEF→lng/lat round-trip that loses 1 ULP per axis. Matching the recorded base64 needs the same round-trip. - ST_AsGeoJSON on multi-vertex shapes (4 fixtures) — BigQuery interpolates midpoints along geodesic arcs in GeoJSON output for LINESTRING / MultiLineString / GeometryCollection / MultiPolygon.
- ST_Centroid + ST_Intersection on small polygons (2 fixtures) —
the centroid sits at
(2, 2.00040218892024)spheroidally vs exactly(2, 2)planar; the intersection's edges bulge by ~1.2e-3 degrees along geodesics.
Rationale: a complete spheroidal implementation (closing every remaining fixture above) would require the S2 library or equivalent spheroidal backend — substantial complexity for fixtures that the existing helpers already close for the common ST_DISTANCE / ST_AREA / ST_LENGTH / ST_PERIMETER / ST_DWITHIN paths.
Shape returns — divergence is small but non-zero at every scale.
ST_CENTROID, ST_INTERSECTION, ST_ASGEOJSON on long edges,
and ST_DWITHIN (the predicate flips when the planar distance
happens to straddle a meter-scaled threshold the spheroidal distance
does not) all return planar-vs-spheroidal coordinate drift because
geodesics curve while planar lines stay straight. The drift is small
in absolute terms (typically <0.001 degrees at small scales) but
exceeds the rel_tol=1e-12 FLOAT64 tolerance the runner uses
for non-WKT-shaped float comparisons. The small-scale
st_centroid_polygon, st_intersection_polygons, and
st_dwithin_no fixtures are examples.
Rationale: the emulator is an integration-test target. A correct
spheroidal implementation would require shipping a second geometry
library (s2geometry or shapely + projection code) and bridging it
into DuckDB storage — substantial complexity for a use case where
real BigQuery is the canonical answer. A unit-conversion shim
(× 111320 × cos(lat)) for the metric surfaces would still
disagree with the recorded baselines because the underlying geometry
is planar (a 10-km line near the equator measures slightly differently
than the same line at 60°N spheroidally; the cosine-scaling shim
would erase that latitude dependence).
Workaround: validate spatial-query shape in CI against the emulator; validate spatial correctness (numeric metric values, exact geodesic-interpolation coordinates) in a separate conformance-against- real-BQ stage. For development-time sanity checks, the emulator's relative ordering of distances and the topology of intersections / buffers is preserved — only the absolute numeric values diverge.
Conformance fixtures pinned to this section (the metric fixtures
except buffer, every continental metric, and st_dwithin_no all
PASS):
- specialized_types/st_buffer_continental
(BigQuery's 33-vertex geodesic-circle polygon's exact vertex
coordinates need a per-vertex bearing/step generator the helpers
don't yet ship)
- specialized_types/st_centroid_polygon
(the centroid of the unit-degree square is exactly (2, 2)
planar but (2.00000000000004, 2.00040218892024) spheroidal —
needs a spheroidal centroid algorithm beyond the metric helpers)
- specialized_types/st_intersection_polygons
(the planar intersection follows straight edges where the
spheroidal one bulges along geodesics — needs a geodesic-arc
intersection)
- specialized_types/st_asbinary_point
(BigQuery encodes ST_GEOGPOINT(1, 1) via an ECEF→lng/lat
round-trip that loses 1 ULP per axis; recorded
x = 0x3FEFFFFFFFFFFFFE ≈ 0.9999999999999998 instead of an
exact 1.0 — needs ECEF round-trip emulation)
- specialized_types/spheroidal_buffer_street_match
(10 m radius buffer; same vertex-exactness gap as
st_buffer_continental)
- specialized_types/spheroidal_buffer_neighborhood_match
(100 m radius buffer; same root cause)
- specialized_types/spheroidal_buffer_state_xfail
(100 km radius buffer; same root cause)
HLL sketch binary format (HLL_COUNT.INIT / MERGE_PARTIAL)¶
BigQuery's HLL_COUNT.INIT and HLL_COUNT.MERGE_PARTIAL return a
BYTES sketch in a specific HyperLogLog++ binary format documented in
the HLL++ paper but not in
a wire-format specification. Bit-exact reproduction would require
test-driven reverse-engineering of BigQuery's bucket-count selection,
Murmur3 hash variant, sparse/dense representation switch, header
framing, and bias-correction tables — a multi-week workstream
disproportionate to the user-facing benefit (sketches authored in
BigQuery cannot be persisted to a table the emulator can read, and
vice-versa).
The cardinality user-facing semantic is preserved. The emulator
routes the two cardinality-extracting patterns —
HLL_COUNT.EXTRACT(HLL_COUNT.INIT(x)) and HLL_COUNT.MERGE(sketch)
over a subquery union of HLL_COUNT.INIT(x) legs — to
COUNT(DISTINCT x) via HllCountExtractInitRule
and HllCountMergeRule,
following the precedent set by APPROX_COUNT_DISTINCT (ADR 0023 §1.I)
and documented in ADR 0024.
For small-cardinality inputs COUNT(DISTINCT) and HLL agree exactly;
for inputs above HLL's bucket count the values agree within
~1.04/√m (HLL's documented standard error).
The sketch-as-persistable-BYTES semantic is not preserved.
HLL_COUNT.INIT and HLL_COUNT.MERGE_PARTIAL reach DuckDB unchanged
(both functions have no DuckDB primitive); DuckDB raises a
CatalogException which the emulator surfaces as InvalidQueryError.
Workaround: rewrite the query as
HLL_COUNT.EXTRACT(HLL_COUNT.INIT(x)) or COUNT(DISTINCT x) when
the sketch output is not required downstream. If sketch persistence
is required, run the query against real BigQuery — the emulator is
not a drop-in replacement for that pattern.
Conformance fixtures pinned to this section:
- standard_functions/agg_hll_count_init_basic
- standard_functions/agg_hll_count_merge_partial_basic
DBSCAN clustering (ST_CLUSTERDBSCAN)¶
BigQuery's ST_CLUSTERDBSCAN(geog, epsilon, min_pts) OVER (window)
is a window-shaped aggregate that runs the DBSCAN density-based
clustering algorithm over the windowed geometries and assigns each
input a cluster id (or NULL for noise points). DuckDB-spatial has
no DBSCAN primitive; a correct emulator-side implementation would
need to:
- Materialise the windowed geometries.
- Build an
epsilon-neighbourhood index over them (a k-d tree or ball-tree). - Run the DBSCAN cluster-expansion walk with the documented
min_ptsminimum density rule. - Surface the cluster ids back through the window's row order.
The correctness contract is non-trivial (the spheroidal epsilon neighbourhood differs from the planar one for continental-scale inputs; the cluster-expansion order is implementation-defined for ties; the noise-point assignment branches on density at every candidate). Combined with the cardinality-quadratic worst-case runtime over millions of points, the value-to-emulator ratio is poor for v1.0 — DBSCAN is rarely used in queries the emulator is otherwise the right substitute for.
We defer the surface to a future release that ships a dedicated
spatial-clustering backend. Until then, ST_CLUSTERDBSCAN reaches
DuckDB unchanged and raises CatalogException →
InvalidQueryError. No conformance fixture is recorded — the
inventory entry stays 🔴 Uncovered in the matrix and the gap
denominator counts the function against the open gap.
Workaround: run the clustering off-database (Python + scikit-learn or PostGIS) and write the cluster ids back to a BigQuery table the emulator can read.
Legacy SQL (useLegacySql=true)¶
BigQuery accepts two SQL dialects on the same wire surface:
Standard SQL (the default) and Legacy SQL (the original
2011-era dialect retained for backward compatibility). The dialect
is selected per-job by the useLegacySql boolean on
QueryJobConfiguration. Legacy SQL has its own parser, function
catalogue, identifier-quoting rules, scoping rules, NULL handling,
and JOIN syntax — it overlaps with standard SQL only superficially.
A query like SELECT INTEGER(1) is valid legacy SQL and invalid
standard SQL; SELECT CAST(1 AS INT64) is the reverse.
Supporting legacy SQL inside the emulator would require either:
- A second translator pipeline (BigQuery legacy → DuckDB) parallel to the existing standard-SQL one, with its own SQLGlot dialect, its own function-mapping table, its own identifier-resolution rules, and its own divergence catalogue. The maintenance burden approximately doubles the translator surface.
- A pre-translator that rewrites legacy SQL to standard SQL before
the existing pipeline sees it. This is a documented BigQuery-side
migration path (
bq query --use_legacy_sql=falseafter rewriting) but it does not cover the constructs that have no standard-SQL equivalent (e.g., the[project:dataset.table]table-reference syntax, the implicit-FLATTEN scoping for repeated fields, the table-wildcardTABLE_DATE_RANGEfamily).
BigQuery has officially recommended standard SQL since 2017 and flagged legacy SQL as legacy in every release-note from 2017 onwards. New projects do not enable it; existing projects with legacy SQL workloads have an established migration path off it. The user-impact-to-emulator-effort ratio is poor for v1.0 — clients that rely on legacy SQL are migrating off it independently of whether the emulator supports it.
The emulator ships a narrow legacy-to-standard rewriter in
bqemulator.sql.rewriter.legacy_sql
that handles the type-cast subset (INTEGER, FLOAT, STRING,
BOOLEAN, BYTES) and the [project:dataset.table] reference
shape. These rewrites are strict syntactic substitutions —
INTEGER(x) → CAST(x AS INT64), etc. — so simple legacy
queries that only use these constructs round-trip cleanly through
the standard pipeline.
Queries that use legacy-SQL features outside this subset (JOIN EACH,
WITHIN, FLATTEN, the implicit-correlated-subquery rules, the
date_add(NOW(), -7, 'DAY') form, the TABLE_DATE_RANGE family,
etc.) still surface the appropriate translation error from the
standard pipeline. A full legacy-SQL parser remains out of scope.
Workaround for un-rewritten constructs: rewrite the query to
standard SQL (the canonical migration path BigQuery itself
recommends) and submit it with useLegacySql=false (the default).
CTE self-join with window aggregate (TPC-DS Q47)¶
TPC-DS Q47 uses a multi-CTE pattern where a CTE (v1) is
defined with two window aggregates — AVG(SUM(...)) OVER (PARTITION
BY...) for monthly-average sales plus RANK() OVER (PARTITION
BY... ORDER BY d_year, d_moy) for a chronological row-number —
and then self-joined three times in a subsequent CTE (v2):
WITH v1 AS (
SELECT ...,
AVG(SUM(ss_sales_price)) OVER (PARTITION BY ...) AS avg_monthly_sales,
RANK() OVER (PARTITION BY ... ORDER BY d_year, d_moy) AS rn
FROM item, store_sales, date_dim, store
WHERE ...
GROUP BY ...
),
v2 AS (
SELECT v1.*, v1_lag.sum_sales AS psum, v1_lead.sum_sales AS nsum
FROM v1, v1 v1_lag, v1 v1_lead
WHERE v1.rn = v1_lag.rn + 1
AND v1.rn = v1_lead.rn - 1
)
When SQLGlot translates this to DuckDB it inlines v1 three
times into v2. DuckDB's planner raises Binder Error: UNNEST
requires a single list as input on the resulting plan — the
exact internal step that mis-fires is not yet diagnosed.
Closing this divergence cleanly requires either:
- Investigating SQLGlot's inlining strategy for CTEs whose
bodies carry window aggregates and emitting an alternative
plan (materialise the CTE first via
CREATE TEMP TABLE AS SELECT FROM v1before the self-join). DuckDB does honourCREATE TEMP TABLE, so a pre-translator that materialises any multi-times-referenced CTE with a window aggregate would work — but the criteria for "materialise vs inline" need a cost model. - A DuckDB upstream fix to the planner's UNNEST-related binder for the specific shape SQLGlot emits — out of scope here.
The conformance fixture
standard_functions/tpcds_q47
is pinned XFAIL against this divergence. Q47 is the only one of
the 29 TPC-DS fixtures in the corpus that surfaces this issue; the
other 28 picks PASS without code changes.
Workaround: clients that need the same shape should
materialise the CTE manually (CREATE TEMP TABLE v1_materialised
AS SELECT... FROM v1) before the self-join, or refactor to
use LAG/LEAD window functions on the original CTE without
self-joining.
ORC extract¶
Status: Excluded permanently.
BigQuery itself does not support ORC as a destination extract
format. The
documented set
is CSV, JSON, PARQUET, AVRO only. Shipping ORC extract in
the emulator would put bqemulator ahead of BigQuery on a surface
where parity matters — a user who extracts to ORC against the
emulator and then tries to repeat the same job against the real
service would get a surprising failure.
Workaround: Extract to Parquet via the existing executor branch,
then convert downstream with pyorc or pyarrow:
# extract to Parquet from bqemulator, then convert to ORC locally
import pyarrow.parquet as pq
import pyorc
arrow_table = pq.read_table("extract.parquet")
with open("extract.orc", "wb") as fh:
writer = pyorc.Writer(fh, str(arrow_table.schema))
writer.write_rows(arrow_table.to_pylist())
writer.close()
See ADR 0027 for the
load/extract format-coverage contract. ORC load is supported
via the optional [orc] extra; only ORC write is excluded.
INFORMATION_SCHEMA.JOBS* family¶
Status: Excluded permanently.
BigQuery exposes a JOBS / JOBS_BY_PROJECT / JOBS_BY_FOLDER /
JOBS_BY_ORGANIZATION family of INFORMATION_SCHEMA views that
surface job history (creation_time, total_bytes_processed,
total_slot_ms, cache_hit, user_email, statement type, etc.).
Job history in the emulator is in-memory only and bounded by the
process lifetime. The INFORMATION_SCHEMA.JOBS* views are
typically used for billing- and quota-analysis queries — slot
consumption, bytes billed per user, week-over-week query cost —
neither of which the emulator models. Implementing a partial view
that returned the in-memory job list would give false confidence
to production billing queries that the emulator silently won't
match.
Rationale: querying job history is a billing/quota observability concern, not a SQL-semantics concern. The emulator has no billing model and no quota subsystem; the views would return real-looking numbers (rows + bytes from the in-memory job log) that don't translate to BigQuery's billing model.
Workaround: query the REST jobs.list endpoint (which IS
shipped — see api-coverage.md) for the
equivalent metadata. The REST response carries
statistics.query.totalBytesProcessed,
statistics.query.statementType, status.errorResult, and the
other job fields a script-side audit needs:
from google.cloud import bigquery
client = bigquery.Client(project="...", client_options=...)
for job in client.list_jobs(state_filter="DONE", max_results=50):
print(job.job_id, job.statement_type, job.total_bytes_processed)
See conformance-coverage-matrix.md
for the INFORMATION_SCHEMA coverage inventory. Goccy's
bigquery-emulator also defers this surface; the emulator's
parity-with-goccy stance keeps it out of scope.
Google Cloud Storage emulation¶
Status: Excluded permanently — the emulator's charter is BigQuery, not GCS.
bqemulator implements the BigQuery REST + gRPC surface. Real BigQuery treats Google Cloud Storage as an external service; the emulator follows the same separation. Implementing a GCS HTTP/JSON-API surface inside the emulator would expand scope from "BigQuery" to "BigQuery + GCS" — a substantially larger maintenance surface for a feature real BigQuery doesn't include.
The emulator's existing BQEMU_GCS_LOCAL_ROOT shim
(ADR 0027) is a filesystem
resolver for gs:// URIs that appear in LOAD / EXTRACT
sourceUris — it maps gs://bucket/path to a local filesystem
path, so a test that pre-stages files on disk can load them. It is
not a GCS API emulator. Anything that needs the actual GCS JSON API
(Beam's BigQueryIO.Write BATCH_LOADS staging step, the Java SDK's
Storage.objects.insert, signed URLs, multipart uploads) must
target a separate GCS emulator.
Workaround for Beam BigQueryIO BATCH_LOADS specifically:
fsouza/fake-gcs-server
ships a Docker image that implements the GCS HTTP/JSON API and stores
objects at {root}/{bucket}/{object} — byte-identical with
BQEMU_GCS_LOCAL_ROOT's expected layout. The scio example
(docs/examples/java/scio/) brings
both containers up with a shared bind mount: Beam stages BATCH_LOADS
shards via fake-gcs-server (which materialises them on disk),
bqemulator's LOAD job reads the same bytes via its filesystem
resolver. See ADR 0034
for the full design.
Reconsidering: an in-process GCS emulation surface would need an RFC demonstrating use cases the sidecar pattern does not cover. The sidecar adds one container to a test fixture; the in-process alternative would add an entire HTTP/JSON API + multipart upload + signed URL surface to bqemulator. The cost/benefit currently favours the sidecar.
Native Windows containers¶
The published image (ghcr.io/jjviscomi/bqemulator) is a multi-arch
Linux image (linux/amd64,linux/arm64). A separate Windows-container
variant (mcr.microsoft.com/windows/nanoserver or servercore base,
with a windows/amd64-tagged manifest entry) is not shipped.
Rationale: Windows-native containers and Linux containers share no
filesystem layers, so supporting both is a parallel build pipeline
with its own CI matrix, runner cost, and dependency-verification
burden. Several of the emulator's native dependencies have historically
been less reliable on Windows containers — the V8 embedding via
mini-racer (UDF runtime), grpc.aio (which uses ProactorEventLoop
on Windows with documented edge cases against the selector loop the
rest of the codebase assumes), and Storage Read API Avro
materialization in particular. Validating all of these on every
release more than doubles the e2e matrix runtime, against a small
marginal audience — in 2026, ~all Windows backend-Python workflows
run via WSL2 + Docker Desktop with the existing Linux image and
require no changes from us.
Workaround for Windows users: install Docker Desktop for
Windows with the
WSL2 backend (the default since Docker Desktop 4.x). The published
Linux image then runs natively under WSL2 — including all networking,
volume mounts, and the published bqemulator CLI. No
Windows-specific configuration is required by the emulator itself.
Reconsidering: open an RFC documenting a real-world WSL2-forbidden use case (e.g. a corporate policy that forbids enabling the Linux subsystem on developer laptops). A native Windows variant is a candidate for v2 if the gap is widely felt.
Reconsidering¶
Every exclusion above has been considered during design. To re-open:
- Open an RFC describing the use case and proposed implementation.
- The TSC decides by consensus or, failing that, majority vote.
- On acceptance, an ADR supersedes the relevant section here.