cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Asset Bundles - Workspace or GIT?

JeremyFord
New Contributor III

We are just starting down the path of migrating from DBX to DAB. 

I have been able to successfully use DAB as per all the available documentation.  

We are very keen to use DAB for development deployments by the data engineering team and the benefits it will bring are many.

One thing I'm not clear on is the best practice of where and how to configure the source for Notebook tasks once we move beyond development.

With DBX we deployed all Jobs (300+) with a GIT source pointing to a specific release TAG, which gave us confidence that was being executed was what we expected it to be and nobody could accidently/intentionally edit the source on the Workspace.

Your Notebook Best Practices still indicates the use of GIT source: https://docs.databricks.com/en/notebooks/best-practices.html

Should we be trying to override the notebook source when deploying to Production to use a GIT source? 

If so how can we easily do this without having to duplicate config for 300+ jobs? (i.e. every job we deploy needs to be at the same GIT tag).

If not, what steps can we take to have the same level of confidence as when configured to use GIT tags?

Thanks.

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

nicole_lu_PM
Databricks Employee
Databricks Employee

Hi Jeremy, 

When using a DAB, the job reads from the workspace source, not the Git source. We will update the doc page to include DAB as an option and specifically call out this point to avoid future confusion.

Check out this example in our talk where we illustrate an end-to-end CICD journey with DAB:  https://github.com/databricks/dais-cow-bff

View solution in original post

2 REPLIES 2

Anne165Hernadez
New Contributor III

Hello!

Migrating from DBX to DAB is an exciting step! For configuring the source for Notebook tasks beyond development, it’s best practice to use a GIT source for production deployments to ensure consistency and prevent accidental changes. To avoid duplicating configuration for 300+ jobs, you can use a centralized configuration management system, environment variables, and automation scripts. Centralized configuration allows you to set the GIT tag centrally and apply it across all jobs. Environment variables can dynamically set the GIT tag, and automation scripts can fetch the latest GIT tag and update the job configurations accordingly. By implementing these practices, you can maintain the same level of confidence as with DBX, ensuring that your production environment is stable and consistent.

nicole_lu_PM
Databricks Employee
Databricks Employee

Hi Jeremy, 

When using a DAB, the job reads from the workspace source, not the Git source. We will update the doc page to include DAB as an option and specifically call out this point to avoid future confusion.

Check out this example in our talk where we illustrate an end-to-end CICD journey with DAB:  https://github.com/databricks/dais-cow-bff

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group