cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

dbt Cloud + Databricks SQL Warehouse with microbatching (48h lookback) โ€” intermittent failures

dheeraj98
New Contributor II

Hey everyone,

Iโ€™m currently running hourly dbt Cloud job (27 models with 8 threads) on a Databricks SQL Warehouse using the dbt microbatch approach, with a 48-hour lookback window.

But Iโ€™m running into some recurring issues:

  • Jobs failing intermittently

  • Occasional 504 errors

: Error during request to server. 
Error properties: attempt=1/30, bounded-retry-delay=None, elapsed-seconds=1.6847290992736816/900.0, error-message=, http-code=504, method=ExecuteStatement, no-retry-reason=non-retryable error, original-exception=, query-id=None, session-id=b'\x01\xf0\xb3\xb37"\x1e@\x86\x85\xdc\xebZ\x84wq'
2025-10-28 04:04:41.463403 (Thread-7 (worker)): 04:04:41 [31mUnhandled error while executing [0m
Exception on worker thread. Database Error
 Error during request to server.
2025-10-28 04:04:41.464025 (Thread-7 (worker)): 04:04:41 On model.xxxx.xxxx: Close
2025-10-28 04:04:41.464611 (Thread-7 (worker)): 04:04:41 Databricks adapter: Connection(session-id=01f0b3b3-3722-1e40-8685-dceb5a847771) - Closing

Has anyone here implemented a similar dbt + Databricks microbatch pipeline and faced the same reliability issues?

Iโ€™d love to hear how youโ€™ve handled it โ€” whether through:

  • dbt Cloud job retries or orchestration tweaks

  • Databricks SQL Warehouse tuning - it tried over-provisioning multi fold and it didn't make a difference

  • Adjusting the microbatch config (e.g., lookback period, concurrency, scheduling)

  • Or any other resiliency strategies

Thanks in advance for any insights!

1 REPLY 1

nayan_wylde
Esteemed Contributor

Here are few options  you can try and see if it resolves your issue.

1. SQL Warehouse Tuning

Use Serverless SQL Warehouse with Photon for faster spin-up and query execution. [docs.getdbt.com]
Size Appropriately: Start with Medium or Large, and enable auto-scaling for concurrency.
Keep Warehouse Warm: Schedule a lightweight query every 10โ€“15 minutes to prevent cold starts.

2. Microbatch Optimization

Reduce Lookback: Try lowering from 48h to 24h or 12h, especially if late-arriving data is rare.
Set event_time on all upstream models to avoid full scans.
Tune concurrent_batches: Explicitly set this to a lower value (e.g., 2โ€“4) to reduce parallel query load.

3. dbt Cloud Job Resiliency

Enable Job Retries: Configure retries with exponential backoff in dbt Cloud.
Split Models into Multiple Jobs: Break the 27 models into logical groups to reduce thread contention.
Use dbt Artifacts for Monitoring: Track model run times and failures using the dbt_artifacts package.