Databricks Community

arjungoel1995 · ‎09-06-2024

Hi Databricks support, I am looking for a standardized Databricks framework to update job definition using DevOps from non-production till it get productionized. Our current process of updating the Databricks job definition is as follows:

In our source code repo, we have a `databricks_notebook_jobs` directory, under which we create a folder with the job name if it doesn't exist. In the job definition there should be two files `job-definition.json` and `spark_env_vars/<env>.json`, where env is `dev`, `qa` or `prod`.
Then we update job-definition.json:
1. Open the workflow on Databricks console, click on three dots on top right and then click on `View YAML/JSON` option.
2. Under the Job source, click on JSON API and then further click on Get option (similar for YAML).
3. Copy the content and paste it in a visual editor or notepad.
4. Then we remove some sections and also replace variable keys like `Job cluster key`, `Google service account`, `Cluster init script path with required path` and `Branch name` with their corresponding variables present in `devops/params.json`, `spark_env_vars/`, `databricks_notebook_jobs/default_job_config.json` and `.github/workflows/databricks_action.yaml`.
After that we copy the updated job definition code and replace it with existing job definition present inside `databricks_notebook_jobs/job_folder/job-definition.json`.
After updating the job definition, we update the semantic version present inside the version.py and `devops/release.json` files needed and it will trigger the GHA workflows to update Databricks job.

This is a cumbersone and error-prone process as there are a lot manual steps involved and if we miss any of the step while updating the workflow we have to start again and raise a new PR. Is there a way we could serve it as a self-service framework which is standardized as there are possibilities where we have to make frequent changes on daily basis and using above process is not the appropriate way to do.

Please suggest.

Arjun Goel

AndySkinner · ‎09-06-2024

Hi, I think this is what DABS is for (databricks asset bundles) and more recently pyDABS which is a pythonic method of implementing DABS.

What are Databricks Asset Bundles? | Databricks on AWS

View solution in original post

nicole_lu_PM · ‎09-19-2024

Hi from the Git folders/Repos PM:

DAB is the way to go, and we are working on an integration to author DABs directly in the workspace.

Here's a DAIS talk where the DAB PM and I demo'ed some recommendations for source controlling jobs: https://www.databricks.com/dataaisummit/session/path-production-databricks-project-cicd-seamless-inn...

View solution in original post

AndySkinner · ‎09-06-2024

Hi, I think this is what DABS is for (databricks asset bundles) and more recently pyDABS which is a pythonic method of implementing DABS.

What are Databricks Asset Bundles? | Databricks on AWS

nicole_lu_PM · ‎09-19-2024

Hi from the Git folders/Repos PM:

DAB is the way to go, and we are working on an integration to author DABs directly in the workspace.

Here's a DAIS talk where the DAB PM and I demo'ed some recommendations for source controlling jobs: https://www.databricks.com/dataaisummit/session/path-production-databricks-project-cicd-seamless-inn...

Databricks Community

Standardized Framework to update Databricks job definition using CI/CD

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!