<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Standardized Framework to update Databricks job definition using CI/CD in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/standardized-framework-to-update-databricks-job-definition-using/m-p/88839#M257</link>
    <description>&lt;P&gt;Hi Databricks support, I am looking for a standardized Databricks framework to update job definition using DevOps from non-production till it get productionized. Our current process of updating the Databricks job definition is as follows:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;In our source code repo, we have a `databricks_notebook_jobs` directory, under which we create a folder with the job name if it doesn't exist. In the job definition there should be two files `job-definition.json` and `spark_env_vars/&amp;lt;env&amp;gt;.json`, where env is `dev`, `qa` or `prod`.&lt;/LI&gt;&lt;LI&gt;Then we update job-definition.json:&lt;OL&gt;&lt;LI&gt;Open the workflow on Databricks console, click on three dots on top right and then click on `View YAML/JSON` option.&lt;/LI&gt;&lt;LI&gt;Under the Job source, click on JSON API and then further click on Get option (similar for YAML).&lt;/LI&gt;&lt;LI&gt;Copy the content and paste it in a visual editor or notepad.&lt;/LI&gt;&lt;LI&gt;Then we remove some sections and also replace variable keys like `Job cluster key`, `Google service account`, `Cluster init script path with required path` and `Branch name` with their corresponding variables present in `devops/params.json`, `spark_env_vars/`, `databricks_notebook_jobs/default_job_config.json` and `.github/workflows/databricks_action.yaml`.&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;After that we copy the updated job definition code and replace it with existing job definition present inside `databricks_notebook_jobs/job_folder/job-definition.json`.&lt;/LI&gt;&lt;LI&gt;After updating the job definition, we update the semantic version present inside the version.py and `devops/release.json` files needed and it will trigger the GHA workflows to update Databricks job.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;This is a cumbersone and error-prone process as there are a lot manual steps involved and if we miss any of the step while updating the workflow we have to start again and raise a new PR. Is there a way we could serve it as a self-service framework which is standardized as there are possibilities where we have to make frequent changes on daily basis and using above process is not the appropriate way to do.&lt;/P&gt;&lt;P&gt;Please suggest.&lt;/P&gt;</description>
    <pubDate>Fri, 06 Sep 2024 09:09:48 GMT</pubDate>
    <dc:creator>arjungoel1995</dc:creator>
    <dc:date>2024-09-06T09:09:48Z</dc:date>
    <item>
      <title>Standardized Framework to update Databricks job definition using CI/CD</title>
      <link>https://community.databricks.com/t5/community-articles/standardized-framework-to-update-databricks-job-definition-using/m-p/88839#M257</link>
      <description>&lt;P&gt;Hi Databricks support, I am looking for a standardized Databricks framework to update job definition using DevOps from non-production till it get productionized. Our current process of updating the Databricks job definition is as follows:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;In our source code repo, we have a `databricks_notebook_jobs` directory, under which we create a folder with the job name if it doesn't exist. In the job definition there should be two files `job-definition.json` and `spark_env_vars/&amp;lt;env&amp;gt;.json`, where env is `dev`, `qa` or `prod`.&lt;/LI&gt;&lt;LI&gt;Then we update job-definition.json:&lt;OL&gt;&lt;LI&gt;Open the workflow on Databricks console, click on three dots on top right and then click on `View YAML/JSON` option.&lt;/LI&gt;&lt;LI&gt;Under the Job source, click on JSON API and then further click on Get option (similar for YAML).&lt;/LI&gt;&lt;LI&gt;Copy the content and paste it in a visual editor or notepad.&lt;/LI&gt;&lt;LI&gt;Then we remove some sections and also replace variable keys like `Job cluster key`, `Google service account`, `Cluster init script path with required path` and `Branch name` with their corresponding variables present in `devops/params.json`, `spark_env_vars/`, `databricks_notebook_jobs/default_job_config.json` and `.github/workflows/databricks_action.yaml`.&lt;/LI&gt;&lt;/OL&gt;&lt;/LI&gt;&lt;LI&gt;After that we copy the updated job definition code and replace it with existing job definition present inside `databricks_notebook_jobs/job_folder/job-definition.json`.&lt;/LI&gt;&lt;LI&gt;After updating the job definition, we update the semantic version present inside the version.py and `devops/release.json` files needed and it will trigger the GHA workflows to update Databricks job.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;This is a cumbersone and error-prone process as there are a lot manual steps involved and if we miss any of the step while updating the workflow we have to start again and raise a new PR. Is there a way we could serve it as a self-service framework which is standardized as there are possibilities where we have to make frequent changes on daily basis and using above process is not the appropriate way to do.&lt;/P&gt;&lt;P&gt;Please suggest.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Sep 2024 09:09:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/standardized-framework-to-update-databricks-job-definition-using/m-p/88839#M257</guid>
      <dc:creator>arjungoel1995</dc:creator>
      <dc:date>2024-09-06T09:09:48Z</dc:date>
    </item>
    <item>
      <title>Re: Standardized Framework to update Databricks job definition using CI/CD</title>
      <link>https://community.databricks.com/t5/community-articles/standardized-framework-to-update-databricks-job-definition-using/m-p/88840#M258</link>
      <description>&lt;P&gt;Hi,&amp;nbsp; I think this is what DABS is for (databricks asset bundles) and more recently pyDABS which is a pythonic method of implementing DABS.&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/en/dev-tools/bundles/index.html" target="_blank"&gt;What are Databricks Asset Bundles? | Databricks on AWS&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Sep 2024 09:15:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/standardized-framework-to-update-databricks-job-definition-using/m-p/88840#M258</guid>
      <dc:creator>AndySkinner</dc:creator>
      <dc:date>2024-09-06T09:15:12Z</dc:date>
    </item>
    <item>
      <title>Re: Standardized Framework to update Databricks job definition using CI/CD</title>
      <link>https://community.databricks.com/t5/community-articles/standardized-framework-to-update-databricks-job-definition-using/m-p/91119#M280</link>
      <description>&lt;P&gt;Hi from the Git folders/Repos PM:&lt;/P&gt;
&lt;P&gt;DAB is the way to go, and we are working on an integration to author DABs directly in the workspace.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Here's a DAIS talk where the DAB PM and I demo'ed some recommendations for source controlling jobs:&amp;nbsp;&lt;A href="https://www.databricks.com/dataaisummit/session/path-production-databricks-project-cicd-seamless-inner-outer-dev-loops" target="_blank"&gt;https://www.databricks.com/dataaisummit/session/path-production-databricks-project-cicd-seamless-inner-outer-dev-loops&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 20 Sep 2024 01:45:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/standardized-framework-to-update-databricks-job-definition-using/m-p/91119#M280</guid>
      <dc:creator>nicole_lu_PM</dc:creator>
      <dc:date>2024-09-20T01:45:27Z</dc:date>
    </item>
  </channel>
</rss>

