cancel
Showing results for 
Search instead for 
Did you mean: 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results for 
Search instead for 
Did you mean: 

Comparing Methods for Scheduling Streaming updates via dbt

bobmclaren
New Contributor II

We are trying to schedule updates to streaming tables and materialized views in Azure Databricks that we have defined in dbt.

Two options we are considering are `SCHEDULE CRON` and just scheduling `dbt run` commands via CI/CD. 

The `SCHEDULE CRON` option seems attractive at first because it utilizes the *significantly cheaper* jobs compute SKUs.  However, I cannot find any kind of provision for orchestrating the refreshes so that dependencies are considered (i.e. Refresh the dependent MV after the ST is refreshed).  This adversely affects the recency of the data in the MVs that are dependent upon upstream STs due to the necessary time gap that must be placed between them in the schedules.

The `dbt run` approach handles this elegantly, multithreading where necessary and refreshing MV/STs in order according to their dependencies.  Unfortunately, it seems that dbt must connect to a SQL warehouse and thus cannot use the more cost efficient jobs compute SKUs.

Is my understanding of the pros/cons laid out here correct?  Are there other approaches that would provide a more cost effective use of resources?

 

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now