Using INFORMATION_SCHEMA against the emulator¶
BigQuery exposes a family of virtual catalog views under
{project}.{dataset}.INFORMATION_SCHEMA.* that describe the
catalog itself: datasets, tables, columns, options, views,
partitions, routines, materialized views, and row-access policies.
The emulator implements 9 of the 10 published views — every
view BigQuery ships except the JOBS family. The reference
emulator (goccy/bigquery-emulator) covers a subset (4 views);
bqemulator is a superset.
Supported views¶
| View | Source of truth |
|---|---|
SCHEMATA |
catalog list_datasets() |
TABLES |
catalog list_tables() |
COLUMNS |
per-table TableSchema.fields |
TABLE_OPTIONS |
TableMeta.description / friendly_name / labels / expiration_time / time_partitioning.require_partition_filter |
VIEWS |
tables filtered to table_type='VIEW' |
PARTITIONS |
live DuckDB GROUP-BY on the partitioning column |
ROUTINES |
catalog list_routines() |
MATERIALIZED_VIEWS |
catalog list_materialized_views() |
ROW_ACCESS_POLICIES |
catalog list_all_row_access_policies() |
Scope-qualified references¶
All three BigQuery scope forms work:
-- Three-part: project.dataset.INFORMATION_SCHEMA.X
SELECT * FROM `my-project.my_dataset.INFORMATION_SCHEMA.TABLES`;
-- Two-part: dataset.INFORMATION_SCHEMA.X (current project)
SELECT * FROM `my_dataset.INFORMATION_SCHEMA.TABLES`;
-- Unqualified: INFORMATION_SCHEMA.X — SCHEMATA only (per BQ docs;
-- other views require at least a dataset prefix).
SELECT * FROM INFORMATION_SCHEMA.SCHEMATA;
For SCHEMATA, BigQuery's canonical form uses a region-X anchor
(e.g. region-us.INFORMATION_SCHEMA.SCHEMATA); the emulator's
permissive regex accepts it and lists all datasets in the project.
Examples¶
Find every table in a dataset (dbt / Looker pattern)¶
SELECT table_name, table_type, is_insertable_into, creation_time
FROM `${PROJECT}.${DATASET}.INFORMATION_SCHEMA.TABLES`
WHERE table_type = 'BASE TABLE'
ORDER BY table_name;
Discover the column schema of a table (Dataform pattern)¶
SELECT column_name, ordinal_position, data_type, is_nullable, is_partitioning_column
FROM `${PROJECT}.${DATASET}.INFORMATION_SCHEMA.COLUMNS`
WHERE table_name = 'orders'
ORDER BY ordinal_position;
Enumerate the partitions of a partitioned table¶
SELECT partition_id, total_rows, last_modified_time, storage_tier
FROM `${PROJECT}.${DATASET}.INFORMATION_SCHEMA.PARTITIONS`
WHERE table_name = 'events'
ORDER BY partition_id;
Read a table's options (description, labels, partition filter)¶
SELECT option_name, option_type, option_value
FROM `${PROJECT}.${DATASET}.INFORMATION_SCHEMA.TABLE_OPTIONS`
WHERE table_name = 'orders';
Inspect a view's body¶
SELECT table_name, view_definition, use_standard_sql
FROM `${PROJECT}.${DATASET}.INFORMATION_SCHEMA.VIEWS`
WHERE table_name = 'monthly_summary';
Out of scope — INFORMATION_SCHEMA.JOBS*¶
The JOBS / JOBS_BY_USER / JOBS_BY_PROJECT / JOBS_BY_FOLDER
/ JOBS_BY_ORGANIZATION views are permanently
out of scope.
Job history in the emulator is in-memory only; the
INFORMATION_SCHEMA.JOBS* views are primarily billing/quota
observability surfaces and the emulator has no billing model.
Use the REST jobs.list endpoint instead:
from google.cloud import bigquery
client = bigquery.Client(...)
for job in client.list_jobs(state_filter="DONE"):
print(job.job_id, job.statement_type, job.total_bytes_processed)
How it works under the hood¶
The emulator implements INFORMATION_SCHEMA via a pre-translation
rewriter (src/bqemulator/sql/rewriter/information_schema.py)
that detects INFORMATION_SCHEMA.<view> references in the BQ SQL
input and replaces each with an inline VALUES subquery
materialised from the catalog. No DuckDB-side virtual tables are
exposed to user queries; the rewriter runs in the same translator
phase as the wildcard expander, before the BigQuery → DuckDB
transpile step.
This design keeps the implementation simple, makes the column
schema match BigQuery's documented contract byte-for-byte, and
avoids dragging DuckDB's native information_schema columns
(different shape, different types) into the BigQuery surface.
For PARTITIONS, the rewriter queries the catalog's
list_partitions() helper which in turn runs a GROUP BY on the
partitioning column against the table's physical DuckDB storage —
this is the one view that touches live row data instead of pure
catalog metadata.