ADR 0015: Scripting execution model — lexer/parser/AST/interpreter with signal-based control flow¶
- Status: Accepted
Context¶
ADR 0011 locked the tree-walking interpreter for BigQuery procedural scripting. That ADR left open concrete design points that must be decided before implementation:
- How to parse BigQuery scripting syntax given that SQLGlot's parser
falls back to a raw
Commandnode for most control-flow constructs. - How non-local control flow (BREAK, CONTINUE, RETURN, exception propagation) flows through the interpreter.
- How lexical scope works across nested BEGIN/END blocks and across procedure calls.
- How dynamic SQL (
EXECUTE IMMEDIATE) evaluates parameters and returns result rows.
Three implementation options were considered:
- Pure SQLGlot fallback. Rely on
sqlglot.parseand treat every fallbackCommandnode as opaque. Rejected: SQLGlot rolls IF/WHILE bodies into a singleCommand, so we can never walk into them. - Preprocess to one-statement-per-call. Split the script on
semicolons, feed each statement to the translator. Rejected:
control flow boundaries (
IF... END IF) span multiple statements. - Custom lexer + recursive-descent parser + AST nodes. Tokenise with a small lexer that understands BigQuery keywords and strings, parse into a typed AST covering every Phase 6 scripting construct, walk the AST.
Decision¶
Option 3: a small self-contained lexer + parser + AST + interpreter
in bqemulator.scripting.
Module layout¶
scripting/
├── ast.py # frozen dataclasses: Statement + Expression hierarchies
├── lexer.py # token stream with BigQuery-aware string/identifier rules
├── parser.py # recursive-descent parser → AST
├── frames.py # FrameStack with push/pop/declare/set/lookup
├── exceptions.py # ScriptRaise + control-flow signals
└── interpreter.py # walks AST + executes SQL statements via the engine
Parser scope¶
Covers the full Phase 6 surface:
- Declarations:
DECLARE name [, name]* TYPE [DEFAULT expr]; - Assignment:
SET name = expr;andSET (a, b) = (SELECT...); - Conditionals:
IF expr THEN... ELSEIF expr THEN... ELSE... END IF; - Loops:
WHILE expr DO... END WHILE;,LOOP... END LOOP;,FOR name IN (SELECT...) DO... END FOR; - Branch control:
BREAK;/LEAVE;,CONTINUE;/ITERATE; - Blocks:
BEGIN... [EXCEPTION WHEN ERROR THEN...] END; - Dynamic SQL:
EXECUTE IMMEDIATE sql_expr [INTO names] [USING values]; - Invocations:
CALL proj.ds.proc(args);,RETURN [expr]; - Nested DDL:
CREATE [OR REPLACE] [TEMP] FUNCTION/PROCEDURE...
Every statement not recognised by the scripting parser is passed through to the existing SQL translator as a single SQL statement. This keeps the parser narrowly focused on control flow and defers every data-plane statement (SELECT, INSERT, UPDATE, MERGE, DELETE, TRUNCATE, CREATE TABLE) to the SQL pipeline. If the body is a single SQL statement, the script interpreter is not even entered.
Signal-based control flow¶
Non-local transfer is represented by exceptions that inherit from a
module-private _ControlSignal base (not a DomainError):
BreakSignal— caught by the loop frame.ContinueSignal— caught by the loop frame.ReturnSignal(value)— caught by the procedure-call frame; bubbles up through nested loops/blocks without being absorbed.ScriptRaise(domain_error)— caught by the nearest matchingEXCEPTION WHENhandler.
Because every signal is an exception, the interpreter can let them
bubble up through nested execute_* dispatch methods; each construct
that should absorb a given signal simply catches it at the right layer.
Any DomainError raised during SQL execution is wrapped in a
ScriptRaise so handlers match it uniformly.
Frame stack + lexical scope¶
Frameholds a dict of name → value and a reference to its parent.FrameStack.push()opens a new frame;pop()discards it.declare(name, type, default)inserts in the current frame only; shadowing an outer name is a parse-time error to match BigQuery.set(name, value)walks outward to find the first frame owning the name;SET nonexistent =...raises anInvalidQueryError.lookup(name)walks outward; unresolved names becomeInvalidQueryError. Type coercion uses the declared BigQuery type.
Procedures open a new frame with only the parameter bindings — they do not see the caller's locals. This matches BigQuery's stored-procedure scoping (arguments + session variables, nothing else).
Expression evaluator¶
Variables inside scripting expressions are resolved by the interpreter
before the SQL is handed to the engine: every @var_name reference
in a SELECT / SET / INSERT body is rewritten to a DuckDB $param and
the resolved value is passed as a bound parameter. This lets ordinary
SQL use script variables without a string-concat vulnerability.
EXECUTE IMMEDIATE¶
EXECUTE IMMEDIATE sql_expr first evaluates sql_expr to a string,
then runs the string through the full translation pipeline (wildcard
expander + translator + table rewriter). USING v1, v2 binds
positional parameters; INTO v1, v2 writes the first row of the result
into the named variables (error on multi-row results).
Result accumulation¶
The top-level run_script(ctx, sql) returns the final SELECT result
(if the last executed statement was a query), matching BigQuery's
scripting job statistics shape. Earlier queries emit a structured log
event but do not accumulate into an output table — jobs only stream
back one result at a time.
Consequences¶
- Positive: Correct by construction across every Phase 6 scripting construct. No hidden fallbacks.
- Positive: Every data-plane statement still goes through the existing SQL rule registry — one pipeline, one place to audit.
- Positive: Signals are plain exceptions → Python's existing stack-unwinding semantics drive the right behaviour without a bespoke trampoline.
- Positive: SQL injection is impossible at the scripting/SQL boundary because script variables always reach DuckDB as bound parameters.
- Negative: A small custom parser is code we now own. Mitigated by: aggressive unit coverage + Hypothesis property tests + conformance tests against real BigQuery scripting output.
- Negative: Scripts that rely on BigQuery's raw error-message text will see a different (but stable) shape from the emulator. Documented in the scripting guide.