Hey everyone,
Iโm currently running hourly dbt Cloud job (27 models with 8 threads) on a Databricks SQL Warehouse using the dbt microbatch approach, with a 48-hour lookback window.
But Iโm running into some recurring issues:
: Error during request to server.
Error properties: attempt=1/30, bounded-retry-delay=None, elapsed-seconds=1.6847290992736816/900.0, error-message=, http-code=504, method=ExecuteStatement, no-retry-reason=non-retryable error, original-exception=, query-id=None, session-id=b'\x01\xf0\xb3\xb37"\x1e@\x86\x85\xdc\xebZ\x84wq'
2025-10-28 04:04:41.463403 (Thread-7 (worker)): 04:04:41 [31mUnhandled error while executing [0m
Exception on worker thread. Database Error
Error during request to server.
2025-10-28 04:04:41.464025 (Thread-7 (worker)): 04:04:41 On model.xxxx.xxxx: Close
2025-10-28 04:04:41.464611 (Thread-7 (worker)): 04:04:41 Databricks adapter: Connection(session-id=01f0b3b3-3722-1e40-8685-dceb5a847771) - Closing
Has anyone here implemented a similar dbt + Databricks microbatch pipeline and faced the same reliability issues?
Iโd love to hear how youโve handled it โ whether through:
dbt Cloud job retries or orchestration tweaks
Databricks SQL Warehouse tuning - it tried over-provisioning multi fold and it didn't make a difference
Adjusting the microbatch config (e.g., lookback period, concurrency, scheduling)
Or any other resiliency strategies
Thanks in advance for any insights!