ADR 0038: SESSION_USER() — caller-identity-bound pre-translator substitution¶
- Status: Accepted
Context¶
BigQuery's SESSION_USER()
returns the email address of the user running the query —
alice@example.com for a user principal,
sa@project.iam.gserviceaccount.com for a service account. The
canonical production use case is "tenant isolation by email domain"
in row-access policy filters:
CREATE ROW ACCESS POLICY tenant_by_session_user ON dataset.tenants
GRANT TO ('allAuthenticatedUsers')
FILTER USING (REGEXP_EXTRACT(SESSION_USER(), r'@(.+)$') = tenant_id);
The pattern resolves the caller's email, extracts the domain, and
matches it against a per-row tenant key — so a caller from
@example.com sees only the example.com rows without any
explicit grantee list (which would otherwise need to enumerate every
authenticated user). Real BigQuery users rely on this; the emulator
must support it.
Pre-PR audit findings (verified at the start of the work):
- Zero implementation under
src/bqemulator/sql/— the function was never wired. - Zero tests anywhere (
grep -rn 'SESSION_USER' tests/returned only the surface-inventory entry). - The surface-inventory entry (
tests/conformance/_surface_inventory.py) rendered in the conformance-coverage matrix as⚪ Excluded by ADR 0022 §1.2 — session-state dependent. Exercised at the unit tier.The last sentence was aspirational — the unit-tier coverage didn't exist. - DuckDB's native
SESSION_USERresolves to the literal'duckdb'(the DuckDB connection's OS identity), not the BigQuery caller — so a query likeSELECT SESSION_USER()would silently return'duckdb'and any RAP filter built on it would match no real caller. - The five-client E2E charter (AGENTS.md "Testing expectations": "every new BigQuery surface item gets a SurfaceItem entry... never skip an e2e test language") was unmet for this surface.
This ADR closes that gap with both the implementation and the end-to-end client coverage.
Decision¶
Implement SESSION_USER() as a pre-translator substitution
pass that takes the per-query CallerIdentity and rewrites
every SESSION_USER() call site in the SQL to a string literal of
the caller's resolved email before SQLGlot's BigQuery → DuckDB
transpile runs. The substitution happens inside
SQLTranslator.translate(), so it covers every place the
translator runs: bare queries, scripts, RAP filter predicates
(inlined by rewrite_for_row_access), view bodies (also inlined
by the same pass), and the gRPC Storage Read row-filter rewrite when
caller context is available.
Three options considered¶
| Option | Shape | Decision |
|---|---|---|
| A — SQL pre-translator | New rewrite_session_user(sql, caller) AST walk substitutes SessionUser nodes with Literal(email) before SQLGlot transpiles. Needs caller plumbed into Translator.translate. |
✅ Chosen. |
| B — RAP-rewriter-only substitution | Substitute only inside policy filters. Bare SELECT SESSION_USER() raises UnsupportedFeatureError. |
Rejected. Narrower than BigQuery's contract; surprises users. |
| C — DuckDB scalar UDF reading a session variable | Register a Python-side bqemu_session_user() UDF that reads per-query state from a context variable. Translator rewrites SESSION_USER() → bqemu_session_user(). |
Rejected. DuckDB's connection is shared across requests; per-call state plumbing through a UDF context is more invasive than threading caller directly. |
Option A wins on three axes:
- BigQuery semantic fidelity.
SESSION_USER()works everywhere SQL runs — not just inside RAP filters. - Existing infrastructure reuse. The row-access enforcement
path already constructs
CallerIdentityper-request and threads it throughrewrite_for_row_access; extending the same value one layer down (to the translator) is additive and uses an existing dataclass. - Smallest concept surface. Option C would add a new session-
state primitive to the translator and require per-query DuckDB
SETsemantics; Option A is a single new rewriter module + an optional kwarg on one function.
The cost is plumbing caller: CallerIdentity | None through
five call sites (jobs/executor.py, three sites in
scripting/interpreter.py, one in grpc_api/read_servicer.py).
Each is additive — the new parameter defaults to None, which
folds to the unauthenticated fallback identity, so the change is
backward-compatible by construction.
Resolution contract¶
The pure resolver
:func:bqemulator.sql.rewriter.session_user.resolve_session_user
maps a CallerIdentity to the literal string SESSION_USER()
should return:
| Caller shape | Returned literal |
|---|---|
is_authenticated == False (DEFAULT_CALLER) |
"anonymous" (sentinel — RAP filters that compare against a tenant key deny every row) |
user:alice@example.com |
"alice@example.com" |
serviceAccount:sa@project.iam.gserviceaccount.com |
"sa@project.iam.gserviceaccount.com" |
group:admins@example.com |
"admins@example.com" (never appears via X-Bqemu-Caller in practice; tolerated defensively) |
domain:example.com |
"example.com" (same — defensive only) |
allUsers / allAuthenticatedUsers / unknown |
raw principal string (passthrough — these are grantee-side identifiers, not caller identifiers) |
Unauthenticated fallback¶
Real BigQuery never invokes SESSION_USER() from an
unauthenticated session — every API call carries a credential. The
emulator's DEFAULT_CALLER (user:anonymous@bqemulator.local
with is_authenticated=False) is purely a robustness fallback for
requests that arrive without an X-Bqemu-Caller header (most
common: integration tests that don't set the header explicitly).
For RAP filters comparing SESSION_USER() against a tenant key,
the "anonymous" sentinel ensures the filter denies every row,
which is the safe default. A documentation note in the
SESSION_USER reference (function-mapping regen) calls out this
behavior so users don't mistake an un-headered test for a real
deny-everything bug.
Sequencing of the pre-translator passes¶
rewrite_session_user runs as the first pre-translator pass
(immediately after the unsupported-keyword reject), ahead of every
other pre-translator in Translator.translate. The ordering
matters because:
- The row-access enforcement pass
(
rewrite_for_row_access) inlines policy filters into the user's query before the query reaches the translator. By the timerewrite_session_userruns, anySESSION_USER()inside a policy filter is already in the SQL string. - Downstream pre-translators (
rewrite_safe_helpers,rewrite_string_helpers,rewrite_datetime_helpers…) all expect either valid BigQuery SQL or a partial rewrite — none of them see a leftoverSESSION_USERbecause we substituted it to a plainSTRINGliteral first.
Rationale¶
Why a pre-translator and not a DuckDB pass-through¶
DuckDB has a built-in SESSION_USER system function. If the
translator left SESSION_USER() alone, SQLGlot would transpile it
to DuckDB's SESSION_USER (verified — see
tests/unit/sql/rewriter/test_session_user.py::test_translator_substitutes_session_user
for the regression pin), which resolves to the literal 'duckdb'
— the DuckDB connection's OS-side identity, not the BigQuery
caller. Every RAP filter built on SESSION_USER() would then
silently deny every legitimate caller. Pre-translator substitution
is the only fix.
Why store the substitution as a Literal and not a parameterised value¶
The natural alternative is to substitute SESSION_USER() with a
positional placeholder (?) and bind the email at query-execution
time. That would let DuckDB's parameter-binding layer carry the
type. But the literal approach is simpler:
- The translator already emits string literals for many other
rewrites (
rewrite_aggregate_variants,rewrite_decimal_literals, etc.). A newLiteral.stringnode is no special case. - The pre-translator pass is already running with full caller context — there's no late-binding benefit.
- Parameter binding would require coordinating the new parameter
with the existing positional-parameter pipeline
(
bind_parametersat the bottom ofjobs/executor.py); literal substitution skips the orchestration entirely.
Why not also rewrite CURRENT_USER / @@session.user¶
BigQuery's CURRENT_USER is a deprecated legacy spelling of
SESSION_USER; @@session.user is the system-variable form
(useful in stored procedures). Each is functionally identical to
SESSION_USER but adds parsing complexity (system variables in
particular use a different AST shape). Out-of-scope per chip prompt
— each is a separate follow-up PR if downstream users surface a
need.
Why SESSION_USER stays excluded from the conformance corpus¶
ADR 0022 §1.2 documents the exclusion: the function is session-state-dependent (its return value depends on which authenticated principal made the API call, which can't be deterministically captured in a recording). The integration suite records the substituted SQL (with the literal email) at the HTTP-request layer, which is reproducible — but the conformance corpus's recording semantics expect the raw user SQL, not the post-rewrite SQL. Keep the exclusion; the new unit + integration + e2e × 4 coverage closes the "exercised at the unit tier" gap that the surface-inventory entry promised.
Consequences¶
Positive¶
- The canonical RAP-with-caller-derived-filter production pattern works end-to-end through every conformance client.
- The surface-inventory promise ("Exercised at the unit tier") is now true — unit + integration + e2e × 4 coverage lands together.
- The pre-translator runs once and substitutes everywhere — no separate code path for RAP filters vs free-form queries vs script statements vs gRPC Storage Read row-restriction filters.
- The new
callerkwarg onTranslator.translateis the right shape for future context-dependent rewrites (e.g.@@project_idif we ever add it) — the architecture is now reusable.
Negative¶
- Five call sites had to pass
caller(jobs/executor.py, three inscripting/interpreter.py, one ingrpc_api/read_servicer.py). Future callers must follow the same pattern; documented in the function docstring. - The Storage Read filter pre-pass at
grpc_api/read_servicer.py:122(_build_filter_sqlfor therow_restrictionfield) doesn't yet receive caller context — it lives below the route handler that resolves identity. ASESSION_USER()in a Storage Readrow_restrictiontherefore folds to"anonymous"regardless of the actual caller. This is unusual (the canonical pattern is RAP filters, not Storage Read row_restriction) and is documented in the function docstring as a known limitation. SESSION_USER()inside a SQL UDF body is not re-substituted per-call (UDFs are pre-translated at definition time, when no caller exists). The function inside a UDF body folds to"anonymous"permanently — different from real BigQuery, which re-resolves the function per-invocation. Documented in the rewriter module docstring.
Neutral¶
- The new
CallerIdentityimport in the translator usesTYPE_CHECKINGto avoid a circular import (therow_access.identitymodule's transitive imports eventually pull in the translator); the runtime resolution of the fallback identity uses a function-scopeimportfor the same reason. Future contributors touchingTranslator.translateshould keep this shape. CallerIdentityis used in the translator as an opaque value type — onlyprincipalandis_authenticatedare read. This keeps the translator decoupled from the row-access matcher internals.
Alternatives considered¶
- Option B — RAP-rewriter-only substitution. Smaller code
surface (only
rewrite_for_row_accesstouches caller identity) but raisesUnsupportedFeatureErroron bareSELECT SESSION_USER(). Surprises users who'd reasonably expect the function to work in free-form queries. Rejected. - Option C — DuckDB scalar UDF. Register a Python-side
bqemu_session_user()UDF and have the translator rewriteSESSION_USER()→bqemu_session_user(). Per-call state would have to plumb through a context variable since DuckDB'sSET SESSIONis connection-scoped and the emulator shares the connection across requests. More invasive than threadingcallerdirectly. Rejected. - Per-thread session variable. Stash the caller in a
contextvars.ContextVarand have a DuckDB scalar UDF read it. Works in async contexts where each request gets its own context, but it adds an implicit dependency that's invisible at the translator surface. Rejected — explicitcallerkwarg is clearer. - Skip
SESSION_USER()entirely; document as unsupported. Closes the gap quickly but fails the chip prompt's stated goal (close the five-client e2e charter for this surface). Rejected. - Hand-authored conformance corpus fixture. Mentioned in the
chip prompt as optional; would require a
${PRINCIPAL}matrix-expander hook the corpus doesn't have today. The unit + integration + e2e × 4 tiers cover the surface; the conformance corpus exclusion (ADR 0022 §1.2) stands. No new fixture added in this PR.
References¶
- BigQuery
SESSION_USER()reference - ADR 0018 — the
CallerIdentitydataclass +X-Bqemu-Callerheader contract this ADR builds on. - ADR 0022 §1.2 / §7 — the
conformance-corpus exclusion for non-deterministic functions, which
SESSION_USER()stays inside. - ADR 0035 / ADR 0036 / ADR 0037 — the template / shape this ADR follows.
- AGENTS.md "Testing expectations" — the five-client e2e charter this ADR satisfies for the SESSION_USER surface.