Architecture overview¶
bqemulator is a single Python process that serves two protocol surfaces (REST + gRPC) against one shared data engine (DuckDB).
High-level diagram¶
┌──────────────────────────────────────────────────────────────────┐
│ bqemulator process │
│ │
│ ┌────────────────┐ ┌───────────────────────────┐ │
│ │ FastAPI │ │ grpc.aio │ │
│ │ (uvicorn) │ │ │ │
│ │ │ │ BigQueryRead │ │
│ │ /bigquery/v2 │ │ BigQueryWrite │ │
│ │ /healthz │ │ grpc.health │ │
│ │ /readyz │ │ │ │
│ │ /metrics │ │ │ │
│ └────────┬───────┘ └─────────┬─────────────────┘ │
│ │ │ │
│ └──────────────┬───────────────┘ │
│ │ │
│ ┌─────────▼────────┐ │
│ │ AppContext │ │
│ │ (composition │ │
│ │ root output) │ │
│ └────────┬─────────┘ │
│ │ │
│ ┌───────────┬───────────┼───────────┬────────────┐ │
│ │ │ │ │ │ │
│ ┌▼──────┐ ┌──▼────┐ ┌────▼────┐ ┌────▼──────┐ ┌───▼──────┐ │
│ │catalog│ │ sql │ │ jobs │ │ streaming│ │ versioning│ │
│ │ │ │ │ │ │ │ │ │ │ │
│ └──┬────┘ └───┬───┘ └────┬────┘ └─────┬─────┘ └─────┬────┘ │
│ │ │ │ │ │ │
│ └──────────┴──────────┼────────────┴──────────────┘ │
│ │ │
│ ┌────────▼──────────┐ │
│ │ DuckDBEngine │ │
│ │ (single conn + │ │
│ │ asyncio lock) │ │
│ └────────┬──────────┘ │
│ │ │
│ ┌─────▼────────┐ │
│ │ DuckDB │ │
│ │ (.duckdb or │ │
│ │ :memory:) │ │
│ └──────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Layers¶
| Layer | Role | Import rule |
|---|---|---|
bqemulator.domain |
Pure types, errors, protocols | No framework imports |
bqemulator.catalog |
Metadata storage (Repository pattern) | Domain + storage only |
bqemulator.storage |
DuckDB lifecycle and primitives | Domain only |
bqemulator.sql |
SQLGlot translation + rules | Domain + storage + catalog |
bqemulator.scripting |
Procedural SQL interpreter | Domain + sql + storage |
bqemulator.udf |
UDF runtimes (SQL, JS, TVF) | Domain + storage + catalog |
bqemulator.jobs |
Query/load/extract/copy/snapshot execution | Domain + sql + storage + catalog |
bqemulator.streaming |
Storage Read/Write APIs | Domain + storage + catalog |
bqemulator.versioning |
Snapshots, time travel, materialized views | Domain + storage + catalog |
bqemulator.types |
GEOGRAPHY, RANGE, INTERVAL handling | Domain + storage |
bqemulator.api |
FastAPI REST adapter | All layers above |
bqemulator.grpc_api |
grpc.aio adapter | All layers above |
bqemulator.observability |
Logging, metrics, tracing | Settings + FastAPI |
bqemulator.server |
Composition root | Everything |
Imports flow downward only. Domain modules never import from adapters.
Request flow (REST query example)¶
- Client POSTs to
/bigquery/v2/projects/{p}/queries. CorrelationIdMiddlewarebinds a request id into the logging context.AccessLogMiddlewarerecords a log entry with timing.MetricsMiddlewarestarts a histogram timer.- FastAPI routes the request to a handler in
api/routes/jobs.py. - Handler constructs a
QueryJobCommandand dispatches viajobs/executor.py. - Executor invokes
SQLTranslator.translate(), thenDuckDBEngine.fetch_arrow(). - Result is converted to BigQuery REST JSON via
storage/arrow_bridge.py. - Response flows back up; middleware records final status and timing.
Key invariants¶
- Single DuckDB connection. All reads and writes go through
DuckDBEngine. Writes serialize onasyncio.Lock. - Immutable domain models. Every catalog entity is a frozen Pydantic
model; mutation goes through
.model_copy(update=...). - UTC-only timestamps. DuckDB connection is opened with
SET TimeZone = 'UTC'; every TIMESTAMP round-trip is UTC. - Errors map to BigQuery ErrorProto. Every
DomainErrorsubclass has an HTTP status, BQreason, and gRPC canonical status.
See each subsystem's own architecture page for details.