Skip to content

ADR 0001: Use DuckDB as the storage and execution engine

  • Status: Accepted
  • Supersedes: none
  • Superseded by: none

Context

bqemulator needs an embedded SQL engine that can execute analytical queries locally with no external dependencies. Candidates considered:

Engine Pros Cons
DuckDB Columnar analytical engine; excellent SQL coverage; ARRAY/STRUCT/JSON native; Arrow interop; tier-1 Python bindings; stable on-disk format Single-writer model; some BQ features absent (GEOGRAPHY without ext)
SQLite Ubiquitous; embedded Row-oriented; weak analytics; no native ARRAY/STRUCT; goccy's chosen engine and source of its SQL gaps
In-memory custom Maximum fidelity Enormous implementation burden
ClickHouse / Databend Columnar, analytical Not embedded; requires a server process

Decision

Use DuckDB. It is the only embedded engine that matches BigQuery's analytical character, and it brings first-class ARRAY/STRUCT/JSON types that map cleanly to GoogleSQL semantics.

Consequences

  • Positive: mature, fast, embedded, no subprocess; Arrow interop makes Storage Read API implementation straightforward; spatial extension gives us GEOGRAPHY; active development and a stable v1.0+ format.
  • Negative: a single writer at a time — writes serialize on an asyncio lock. Acceptable for emulator workloads; documented characteristic.
  • Negative: a few BigQuery features (time travel, materialized views, partitioning) have no native DuckDB equivalent and must be built as a layer on top. Accepted; see ADRs 0006, 0009.