Routines and UDFs¶
bqemulator supports four routine kinds, matching BigQuery's public surface. All four are first-class — created, invoked, and introspected exactly as you would against the real service.
| Kind | Language | Runtime | Examples |
|---|---|---|---|
| SQL scalar UDF | SQL | DuckDB CREATE MACRO |
add_one(x), format_money(cents) |
| Table-valued function | SQL | DuckDB CREATE MACRO... AS TABLE |
recent_events(cutoff) |
| JavaScript scalar UDF | JavaScript | Embedded V8 via mini-racer |
Regex, JSON, custom math |
| Stored procedure | SQL procedural | Scripting interpreter (ADR 0015) | create_orders(qty) |
Under the hood, the UDFRegistry (see
ADR 0014) eagerly
registers each routine with DuckDB on REST CRUD and re-hydrates on
startup, so CREATE-time syntax errors fire immediately and invocations
never pay registration cost.
Quick start — SQL scalar UDF¶
from google.cloud import bigquery
client = bigquery.Client(project="my-project", ...)
ref = bigquery.RoutineReference.from_string("my-project.my_ds.add_one")
routine = bigquery.Routine(ref)
routine.type_ = "SCALAR_FUNCTION"
routine.language = "SQL"
routine.arguments = [
bigquery.RoutineArgument(
name="x",
data_type=bigquery.StandardSqlDataType(
type_kind=bigquery.StandardSqlTypeNames.INT64,
),
),
]
routine.return_type = bigquery.StandardSqlDataType(
type_kind=bigquery.StandardSqlTypeNames.INT64,
)
routine.body = "x + 1"
client.create_routine(routine)
job = client.query("SELECT my_ds.add_one(41) AS answer")
print(next(job.result())["answer"]) # -> 42
Table-valued function¶
A TVF returns a table; its body is a full SELECT and it is
referenced in a FROM clause.
routine.type_ = "TABLE_VALUED_FUNCTION"
routine.language = "SQL"
routine.arguments = [
bigquery.RoutineArgument(
name="n",
data_type=bigquery.StandardSqlDataType(
type_kind=bigquery.StandardSqlTypeNames.INT64,
),
),
]
routine.body = "SELECT i AS value FROM UNNEST(GENERATE_ARRAY(1, n)) AS i"
client.create_routine(routine)
job = client.query("SELECT SUM(value) AS total FROM my_ds.one_to_n(10)")
print(next(job.result())["total"]) # -> 55
JavaScript scalar UDF¶
JavaScript UDFs run in an isolated V8 context per routine. Arguments arrive as standard JSON types; return values round-trip through JSON.
routine.type_ = "SCALAR_FUNCTION"
routine.language = "JAVASCRIPT"
routine.arguments = [
bigquery.RoutineArgument(
name="x",
data_type=bigquery.StandardSqlDataType(
type_kind=bigquery.StandardSqlTypeNames.INT64,
),
),
]
routine.return_type = bigquery.StandardSqlDataType(
type_kind=bigquery.StandardSqlTypeNames.INT64,
)
routine.body = "return x * 2;"
client.create_routine(routine)
Sandboxing¶
- No network.
mini-racerhas nofetch/XMLHttpRequest. - No filesystem. No
require,import, orreadFileSync. - Per-routine memory cap (
BQEMU_UDF_JS_MEMORY_BYTES— default 256 MiB). Exceeding it raisesInvalidQueryErrorwith a clear message. - Per-invocation CPU timeout (
BQEMU_UDF_JS_TIMEOUT_MS— default 5 000 ms). Best-effort: under some asyncio contexts the timeout falls back to the memory cap only. We document this trade-off in ADR 0014.
Stored procedures¶
Procedures are named scripts. They may use every scripting construct
(see scripting.md) and are invoked with CALL.
CREATE OR REPLACE PROCEDURE my_ds.square(x INT64)
BEGIN
SELECT x * x;
END;
CALL my_ds.square(7); -- -> 49
Procedures open a fresh lexical scope on entry; they do not see the caller's local variables. Arguments are the only bindings in the procedure's initial frame.
INFORMATION_SCHEMA.ROUTINES¶
The emulator exposes the same catalog view BigQuery publishes:
SELECT routine_name, routine_type, language, ddl
FROM my_ds.INFORMATION_SCHEMA.ROUTINES
ORDER BY routine_name;
Columns:
| Column | Description |
|---|---|
specific_catalog / routine_catalog |
project id |
specific_schema / routine_schema |
dataset id |
specific_name / routine_name |
routine id |
routine_type |
SCALAR_FUNCTION, TABLE_VALUED_FUNCTION, PROCEDURE |
language |
SQL or JAVASCRIPT |
routine_body |
SQL or EXTERNAL |
data_type |
return type kind (scalar UDFs only) |
created / last_altered |
routine timestamps |
ddl |
synthesised CREATE FUNCTION/PROCEDURE DDL |
Configuration¶
| Environment variable | Default | Notes |
|---|---|---|
BQEMU_UDF_JS_TIMEOUT_MS |
5 000 | Per-invocation CPU cap. |
BQEMU_UDF_JS_MEMORY_BYTES |
268 435 456 (256 MiB) | V8 heap cap. |
BQEMU_SCRIPTING_MAX_STATEMENTS |
10 000 | Per-script statement cap. |
BQEMU_SCRIPTING_MAX_LOOP_ITERATIONS |
1 000 000 | Per-loop iteration cap. |
Known limitations¶
- Named
@query parameters cannot be used from inside a script body; reference script variables by their declared name instead. EXCEPTION WHENonly supportsWHEN ERROR THEN— specific condition names are out of scope for v1 (see out-of-scope.md).- JavaScript UDFs may block a worker thread beyond
cpu_timeout_msin async contexts; the hard memory cap still applies and prevents unbounded allocation. This is documented explicitly in ADR 0014.