Standardized Framework to update Databricks job definition using CI/CD

arjungoel1995 — Fri, 06 Sep 2024 09:09:48 GMT

Hi Databricks support, I am looking for a standardized Databricks framework to update job definition using DevOps from non-production till it get productionized. Our current process of updating the Databricks job definition is as follows:

In our source code repo, we have a `databricks_notebook_jobs` directory, under which we create a folder with the job name if it doesn't exist. In the job definition there should be two files `job-definition.json` and `spark_env_vars/<env>.json`, where env is `dev`, `qa` or `prod`.
Then we update job-definition.json:
1. Open the workflow on Databricks console, click on three dots on top right and then click on `View YAML/JSON` option.
2. Under the Job source, click on JSON API and then further click on Get option (similar for YAML).
3. Copy the content and paste it in a visual editor or notepad.
4. Then we remove some sections and also replace variable keys like `Job cluster key`, `Google service account`, `Cluster init script path with required path` and `Branch name` with their corresponding variables present in `devops/params.json`, `spark_env_vars/`, `databricks_notebook_jobs/default_job_config.json` and `.github/workflows/databricks_action.yaml`.
After that we copy the updated job definition code and replace it with existing job definition present inside `databricks_notebook_jobs/job_folder/job-definition.json`.
After updating the job definition, we update the semantic version present inside the version.py and `devops/release.json` files needed and it will trigger the GHA workflows to update Databricks job.

This is a cumbersone and error-prone process as there are a lot manual steps involved and if we miss any of the step while updating the workflow we have to start again and raise a new PR. Is there a way we could serve it as a self-service framework which is standardized as there are possibilities where we have to make frequent changes on daily basis and using above process is not the appropriate way to do.

Please suggest.

Re: Standardized Framework to update Databricks job definition using CI/CD

AndySkinner — Fri, 06 Sep 2024 09:15:12 GMT

Hi, I think this is what DABS is for (databricks asset bundles) and more recently pyDABS which is a pythonic method of implementing DABS.

What are Databricks Asset Bundles? | Databricks on AWS

Re: Standardized Framework to update Databricks job definition using CI/CD

nicole_lu_PM — Fri, 20 Sep 2024 01:45:27 GMT

Hi from the Git folders/Repos PM:

DAB is the way to go, and we are working on an integration to author DABs directly in the workspace.

Here's a DAIS talk where the DAB PM and I demo'ed some recommendations for source controlling jobs: https://www.databricks.com/dataaisummit/session/path-production-databricks-project-cicd-seamless-inner-outer-dev-loops

topic Standardized Framework to update Databricks job definition using CI/CD in Community Articles

Standardized Framework to update Databricks job definition using CI/CD

Re: Standardized Framework to update Databricks job definition using CI/CD

Re: Standardized Framework to update Databricks job definition using CI/CD