cancel
Showing results for 
Search instead for 
Did you mean: 
Knowledge Sharing Hub
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results for 
Search instead for 
Did you mean: 

Standardized Framework to update Databricks job definition using CI/CD

arjungoel1995
New Contributor

Hi Databricks support, I am looking for a standardized Databricks framework to update job definition using DevOps from non-production till it get productionized. Our current process of updating the Databricks job definition is as follows:

  1. In our source code repo, we have a `databricks_notebook_jobs` directory, under which we create a folder with the job name if it doesn't exist. In the job definition there should be two files `job-definition.json` and `spark_env_vars/<env>.json`, where env is `dev`, `qa` or `prod`.
  2. Then we update job-definition.json:
    1. Open the workflow on Databricks console, click on three dots on top right and then click on `View YAML/JSON` option.
    2. Under the Job source, click on JSON API and then further click on Get option (similar for YAML).
    3. Copy the content and paste it in a visual editor or notepad.
    4. Then we remove some sections and also replace variable keys like `Job cluster key`, `Google service account`, `Cluster init script path with required path` and `Branch name` with their corresponding variables present in `devops/params.json`, `spark_env_vars/`, `databricks_notebook_jobs/default_job_config.json` and `.github/workflows/databricks_action.yaml`.
  3. After that we copy the updated job definition code and replace it with existing job definition present inside `databricks_notebook_jobs/job_folder/job-definition.json`.
  4. After updating the job definition, we update the semantic version present inside the version.py and `devops/release.json` files needed and it will trigger the GHA workflows to update Databricks job.

This is a cumbersone and error-prone process as there are a lot manual steps involved and if we miss any of the step while updating the workflow we have to start again and raise a new PR. Is there a way we could serve it as a self-service framework which is standardized as there are possibilities where we have to make frequent changes on daily basis and using above process is not the appropriate way to do.

Please suggest.

Arjun Goel
1 REPLY 1

AndySkinner
New Contributor II

Hi,  I think this is what DABS is for (databricks asset bundles) and more recently pyDABS which is a pythonic method of implementing DABS.

What are Databricks Asset Bundles? | Databricks on AWS

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group