cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Automating Version Control for Databricks Workflows

Prasad329
New Contributor

I am currently using Databricks Asset Bundles to manage and deploy workflows. While I have successfully automated the version control for notebooks, I am facing challenges with workflows. Specifically, I am looking to automate the process of fetching changes made in the Databricks UI back to the GitHub repository.

Current Workflow:

  1. I create and manage workflows in the Databricks UI.
  2. To update the GitHub repository, I manually export the workflows YAML file from the UI and commit it to the GitHub repository.
  3. I then use GitHub Actions with Databricks CLI bundle commands to redeploy the updated workflows as a bundle.

After deploying workflows as bundle, any changes made directly in the Databricks UI are not automatically fetched and updated in the GitHub repository. The UI shows a notification: "This task was deployed as part of a bundle, avoid editing this copy, instead edit the source and redeploy the bundle." This requires manual intervention to disconnect the workflow from the bundle, make changes, export the YAML, and commit it to the repository.

Requirement: I need to automate the process such that any changes made in the Databricks workflows from the UI are automatically fetched, committed to the GitHub repository, and redeployed as bundles without manual intervention.

Any insights or suggestions would be greatly appreciated!

2 REPLIES 2

-werners-
Esteemed Contributor III

using the UI that is not possible I think.
When using DAB and YML files it can be done.
So I suggest you create the workflow using the UI (because it is easy to use) and then create a DAB out of that (using bundle generate).
I admit, there is still some work to asset bundles.

mark_ott
Databricks Employee
Databricks Employee

Automating the reverse synchronization of Databricks workflow (Job) changes made in the Databricks UI back to a GitHub repository is a significant challenge, mainly due to the intentional directionality and guardrails imposed by Databricks Asset Bundles. Currently, Databricks is designed to treat bundles as the source of truth, discouraging direct UI edits for workflows deployed via bundles. This is why manual export and commit to Git remain necessary, as highlighted by the warning you mentioned in the UI.

Here’s a breakdown of the reality and potential automation workarounds:

Why This Is Challenging

  • Bundle Source of Truth: When a workflow is managed by a Databricks Asset Bundle, Databricks expects all changes to originate from the Git-managed YAML/bundle. Any UI-based changes create a “disconnected” copy, and Databricks does not automatically propagate these changes back to the bundle.

  • No Native Reverse Sync: As of late 2025, there’s no official Databricks API or integration that automatically pushes workflow YAML changes made in the UI back to GitHub.

Possible (But Limited) Automation Strategies

1. Polling and Exporting Workflow Definitions

  • Use the Databricks Jobs API to poll for changes to jobs.

  • If a change is detected (e.g., based on last updated timestamps or computed diffs), programmatically export the workflow/job definition (GET /api/2.1/jobs/get) as YAML.

  • Push the updated YAML to your GitHub repository using a GitHub Action, or a scheduled serverless workflow (e.g. AWS Lambda/GCP Cloud Functions).

Caveats:

  • This approach only works if workflows are not bound to bundles, or are explicitly disconnected, since otherwise changes in the UI aren’t reflected in the bundle YAML.

  • You’ll still need to account for disconnection workflow logic, since jobs edited in the UI after deployment through bundles are still “disconnected copies.”

2. Forcibly Breaking the Bundle Link

  • Develop automation that, if a change is detected in a workflow, automatically triggers a disconnect from the bundle (using the Databricks UI automation, or via the API if available).

  • After disconnection, use the same export-and-commit process as above.

Risk: This may cause confusion, since the job will no longer follow the bundle-based, version-controlled process recommended by Databricks.

3. Manual but Guided Process

If full automation is not feasible due to design constraints, the next best is process automation:

  • Alert users who edit in the UI that they must export the YAML and commit to Git so that bundle-based CI/CD can take over.

  • Use notification bots (Slack, Teams, email) to guide the process, or even initiate a job that exports and proposes a PR via the GitHub API.

Best Practice Recommendation

The intent of Databricks Asset Bundles is to keep the source-of-truth in Git (not Databricks UI). Editing workflow definitions directly in the UI is discouraged for bundle-managed jobs. It’s best to encourage the following workflow:

  • Always make changes in the GitHub-managed YAML (bundle).

  • Redeploy via CI/CD.

  • If urgent UI edits are required, disconnect and treat the workflow as “UI managed,” with all the attendant manual processes.

Summary Table

Approach Feasibility Notes
Polling jobs API/export Limited Only for non-bundle jobs or once disconnected
Programmatic disconnect Possible Risk of state inconsistency, breaks bundle workflow
Full automation Not native No supported native reverse sync from UI→Git
Process improvements Recommended Educate, alert, guide users to follow bundle practices
 
 

References

  • [Databricks Asset Bundles documentation]

  • [Databricks Jobs API reference]

  • [Databricks workflows and version control best practices]

In Summary:
Native, fully automated reverse synchronization from the Databricks UI to GitHub for bundle-managed workflows is not supported due to how Databricks enforces bundle-as-source-of-truth discipline. Workarounds exist for non-bundle jobs but are brittle and not recommended for long-term manageability. Encouraging all workflow changes through version-controlled bundles is the most sustainable approach.