Testing Airflow DAGs against bqemulator¶
Runs an Airflow DAG offline (no scheduler, no webserver) whose tasks
hit bqemulator instead of real BigQuery. The pytest suite exercises
the DAG via Airflow's TaskInstance.run() API.
Pairs with the Airflow integration guide.
What it demonstrates¶
- A DAG that uses
BigQueryInsertJobOperatorto create a dataset, load rows, and run an aggregate query. - An Airflow connection (
google_cloud_default) configured at test time viaAIRFLOW_CONN_GOOGLE_CLOUD_DEFAULTso no Airflow metadata DB is required. - Pointing the connection at
bqemulatorvia theBIGQUERY_EMULATOR_HOSTenv var, set to the full URL including thehttp://scheme (the Airflow Google provider forwards this value verbatim intoclient_options.api_endpointandrequestspicks the adapter from the scheme). - A session-scoped monkey-patch on
google.auth.default()that returnsAnonymousCredentialsso the BQ hook never attempts a JWT grant againstoauth2.googleapis.com/token. bqemulator doesn't validate auth, so the anonymous credentials sail through. - Tests run each task in isolation via
TaskInstance.run(test_mode=True).
Layout¶
dags/load_customers_dag.py — DAG with three BigQuery tasks
tests/test_load_customers_dag.py — exercises each task against emulator
Run¶
make test runs pytest tests/. The bqemu_server fixture starts an
in-process emulator for the test session.
What to look for¶
- The DAG itself is production-shaped — no emulator-specific code.
- Test isolation: each test creates a unique dataset name via
uuid.uuid4()and cleans up in a teardown. - We do not spin up an Airflow scheduler — we use Airflow's task-instance API directly, the recommended pattern for DAG unit tests.