cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Warehousing & Analytics
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Community. Share insights, tips, and best practices for leveraging data for informed decision-making.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Comparing Methods for Scheduling Streaming updates via dbt

bobmclaren
Visitor

We are trying to schedule updates to streaming tables and materialized views in Azure Databricks that we have defined in dbt.

Two options we are considering are `SCHEDULE CRON` and just scheduling `dbt run` commands via CI/CD. 

The `SCHEDULE CRON` option seems attractive at first because it utilizes the *significantly cheaper* jobs compute SKUs.  However, I cannot find any kind of provision for orchestrating the refreshes so that dependencies are considered (i.e. Refresh the dependent MV after the ST is refreshed).  This adversely affects the recency of the data in the MVs that are dependent upon upstream STs due to the necessary time gap that must be placed between them in the schedules.

The `dbt run` approach handles this elegantly, multithreading where necessary and refreshing MV/STs in order according to their dependencies.  Unfortunately, it seems that dbt must connect to a SQL warehouse and thus cannot use the more cost efficient jobs compute SKUs.

Is my understanding of the pros/cons laid out here correct?  Are there other approaches that would provide a more cost effective use of resources?

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group