Databricks Community

dheeraj98 · a week ago

Hey everyone,

I’m currently running hourly dbt Cloud job (27 models with 8 threads) on a Databricks SQL Warehouse using the dbt microbatch approach, with a 48-hour lookback window.

But I’m running into some recurring issues:

Jobs failing intermittently
Occasional 504 errors

: Error during request to server.
Error properties: attempt=1/30, bounded-retry-delay=None, elapsed-seconds=1.6847290992736816/900.0, error-message=, http-code=504, method=ExecuteStatement, no-retry-reason=non-retryable error, original-exception=, query-id=None, session-id=b'\x01\xf0\xb3\xb37"\x1e@\x86\x85\xdc\xebZ\x84wq'
2025-10-28 04:04:41.463403 (Thread-7 (worker)): 04:04:41 [31mUnhandled error while executing [0m
Exception on worker thread. Database Error
Error during request to server.
2025-10-28 04:04:41.464025 (Thread-7 (worker)): 04:04:41 On model.xxxx.xxxx: Close
2025-10-28 04:04:41.464611 (Thread-7 (worker)): 04:04:41 Databricks adapter: Connection(session-id=01f0b3b3-3722-1e40-8685-dceb5a847771) - Closing

Has anyone here implemented a similar dbt + Databricks microbatch pipeline and faced the same reliability issues?

I’d love to hear how you’ve handled it — whether through:

dbt Cloud job retries or orchestration tweaks
Databricks SQL Warehouse tuning - it tried over-provisioning multi fold and it didn't make a difference
Adjusting the microbatch config (e.g., lookback period, concurrency, scheduling)
Or any other resiliency strategies

Thanks in advance for any insights!

nayan_wylde · a week ago

Here are few options you can try and see if it resolves your issue.

1. SQL Warehouse Tuning

Use Serverless SQL Warehouse with Photon for faster spin-up and query execution. [docs.getdbt.com]
Size Appropriately: Start with Medium or Large, and enable auto-scaling for concurrency.
Keep Warehouse Warm: Schedule a lightweight query every 10–15 minutes to prevent cold starts.

2. Microbatch Optimization

Reduce Lookback: Try lowering from 48h to 24h or 12h, especially if late-arriving data is rare.
Set event_time on all upstream models to avoid full scans.
Tune concurrent_batches: Explicitly set this to a lower value (e.g., 2–4) to reduce parallel query load.

3. dbt Cloud Job Resiliency

Enable Job Retries: Configure retries with exponential backoff in dbt Cloud.
Split Models into Multiple Jobs: Break the 27 models into logical groups to reduce thread contention.
Use dbt Artifacts for Monitoring: Track model run times and failures using the dbt_artifacts package.

Databricks Community

dbt Cloud + Databricks SQL Warehouse with microbatching (48h lookback) — intermittent failures

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog