cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
ajalisatgi
Databricks Employee
Databricks Employee

Introduction

Coordinating code changes across environments — especially in data engineering and data science teams can get messy without a structured process. Issues like inconsistent deployments, unstable releases, or last-minute hotfixes often arise when development practices are loosely defined or lack clear conventions. While some teams lean toward trunk-based development or simple branch-per-environment models, others benefit from more structure.

GitFlow is one such branching strategy that introduces a clear framework for managing parallel development, releases, and hotfixes. It’s not the only way to manage version control, but it shines in scenarios where teams need a disciplined approach to separate active development from production-ready code.

When combined with Databricks Asset Bundles (DABs) and GitHub Actions, GitFlow helps create a reliable, end-to-end workflow for deploying projects across DEV, QA, and PROD environments. In this post, we’ll explore how this integration works and when adopting GitFlow makes the most sense for Databricks-based teams.

 

What is GitFlow

GitFlow is a Git branching model designed to manage complex software development projects. It includes two stable branches and several supporting branches:

  • Stable Branches:
    • main: The stable production branch where all releases are tagged.
    • develop: The integration branch where features and fixes are merged before being promoted to the main branch.
  • Supporting Branches:
    • feature: Used for developing new features. Branched off from develop and merged back into develop once complete.
    • release: Prepare for a new production release. Branched from develop and merged into both main and develop.
    • hotfix: Immediate fixes for production issues. Branched from main and merged back into both main and develop.

The GitFlow workflow excels at managing releases by isolating work into different branch types. This separation improves control over which changes move to specific environments on a defined schedule, while still allowing for accelerated deployments when necessary.

ajalisatgi_0-1746637133406.png

 

Why Choose GitFlow

GitFlow stands out from other Git workflows (like trunk-based development, GitHub Flow, or GitLab Flow) due to its structured, role-based branching strategy, which brings several distinct advantages:

  • Structured Release Management: GitFlow enforces a clear separation between long-lived branches (main for production, develop for ongoing integration) and purpose-specific short-lived branches (feature, release, and hotfix). This structure creates a disciplined, predictable development lifecycle—particularly valuable for teams managing versioned releases.
  • Parallel Development: Its branching model enables multiple developers or teams to work on features, hotfixes, or release preparations simultaneously without stepping on each other’s toes. This is particularly helpful in environments where concurrent streams of work must be isolated until ready.
  • Stability and Predictability: By reserving the main branch strictly for production-ready code and staging changes through develop and release branches, GitFlow ensures production stability. In contrast to trunk-based development—where everything merges to a single mainline—GitFlow’s staging layers offer greater confidence in planned releases and regression testing.
  • Support for Complex Projects: GitFlow is ideal for larger teams or products with multiple environments (dev, qa, staging, prod) and a need for structured testing, QA cycles, or even concurrent support of multiple production versions.
  • Hotfix Agility: While most Git strategies can handle hotfixes, GitFlow provides a dedicated hotfix branch type designed specifically for this use case. This allows urgent fixes to be made directly off the main branch, then cleanly merged back into both main and develop, minimizing disruption and ensuring continuity across the codebase.

 

Why DABs Are Useful

Databricks Asset Bundles (DABs) provide a structured and streamlined way to package, share, and deploy Databricks assets—such as jobs, notebooks, and dependencies. Their key benefits include:

  • Efficient Code Versioning and Collaboration: DABs support systematic version control and seamless collaboration when integrated with any Git-based source control platform (e.g., GitHub, GitLab, or Azure Repos).
  • Declarative and Simple Deployment: Using a YAML-based format, DABs make it easy to define and deploy multiple resources with minimal configuration.
  • Substitutions and Variables in DABs: DABs support substitutions and custom variables, which make bundle configuration files modular, reusable, and environment-aware. Settings such as cluster IDs, job parameters, and environment-specific values can be dynamically injected at deploy time, reducing duplication and manual tweaking.
  • Developer-Driven, DevOps-Governed Workflows: DABs shift asset packaging and configuration closer to the developer while allowing DevOps engineers to retain control over staging and production deployments. This balance means developers fully define what needs to be deployed.
  • Automated Software Development Lifecycle (SDLC): DABs facilitate the automation of deployment pipelines and structured SDLC practices. When combined with CI/CD tools, they enable governance workflows such as code reviews, testing gates, and manual approvals before production releases.
  • Scalability and Maintainability: By reducing code duplication and enabling standardized, reusable configurations, DABs help manage complex data pipelines more efficiently and at scale.
  • Governance and Risk Reduction: While DABs do not enforce approvals natively, they integrate with CI/CD pipelines that can enforce approval-based controls—ensuring only authorized and tested changes are promoted to production.

These capabilities make DABs especially valuable for organizations managing complex data engineering workflows, implementing CI/CD, and aiming for scalable, maintainable, and governed deployments.

 

How GitFlow and DABs Complement Each Other

  • Integration with CI/CD: DABs can be seamlessly integrated into a GitFlow-based repository. GitFlow manages source code changes through structured branching, while DABs handle deployment automation and environment-specific configurations.
  • Governance and Control: GitFlow’s branching model—separating features, releases, and hotfixes—aligns well with DABs' deployment lifecycle. Together, they support gated promotions to staging or production via CI/CD approvals and tests.
  • Scalability and Collaboration: GitFlow supports parallel development across teams and features. DABs complement this by enabling standardized, scalable deployments across environments, enhancing both collaboration and release confidence.

 

Integrating GitFlow and DABs

DABs can be versioned and managed in a Git repository, aligning with GitFlow practices. They allow for seamless promotion of assets from development to production, mirroring the GitFlow branching strategy. This integration ensures that changes are systematically tracked and deployed across environments, maintaining consistency and reducing errors.

ajalisatgi_1-1746637207526.png

 

Pre-requisites and Initial Repository Setup

The following steps need to be followed for setting up your new or existing repository.

  1. Clone the databricks-blogposts repository.
  2. Copy all the contents under the ‘2025-04-how-to-use-gitflow-and-dabs-for-seamless-databricks-deployments’ folder to the root of your repository that you wish to manage using Gitflow.
  3. Follow the steps in the README under ‘2025-04-how-to-use-gitflow-and-dabs-for-seamless-databricks-deployments’ to set up your repository for different environments as shown in Figure 2.

 

Deployment Process Using GitFlow and DABs

Standard Release Process

ajalisatgi_2-1746637257083.png

Step 1: Collaboration and Development

It all begins with the developers. Each developer creates individual feature branches derived from the develop branch. They independently work on features, bug fixes, or improvements. Once their work is ready, developers submit a Pull Request (PR) for peer review, fostering collaboration and ensuring high-quality code through collective insights.

Step 2: Integrating and Releasing

  • When the develop branch has sufficient updates and features ready for integration testing, a DevOps Engineer initiates the release process by manually triggering the ‘Draft new release’ GitHub Actions Workflow and specifying whether the release type is "MAJOR" or "MINOR." 

  • Typically:

    • Major: Indicates breaking changes or significant new features.
    • Minor: Indicates new features or enhancements that are backward-compatible.

ajalisatgi_1-1746571229385.gif

  • The ‘Draft new release’ GitHub Actions Workflow completes the following tasks:

    • Deploy to QA: Databricks Assets Bundles (DAB) are deployed directly to a QA Databricks workspace for thorough testing.

    • Version Control: The GitHub Actions workflow fetches the current release version and increments it based on the release type.

    • Release Branch Creation: A new version-specific release branch is created.

      ajalisatgi_2-1746571319345.png
    • Ready for Production: A PR from this release branch to the main branch is automatically generated, ready for review and approval.

ajalisatgi_4-1746571442202.png

Step 3: Automated Production Deployment

  • After peer review and approval, the DevOps Engineer merges the release branch into the main branch, which triggers the ‘Publish new Release’ GitHub Actions workflow automatically.

ajalisatgi_5-1746571556636.gif

  • The ‘Draft new release’ GitHub Actions Workflow completes the following tasks:

    • Deploy to Production: Validated DAB assets are deployed to the Production Databricks workspace.

    • Publish and Tag: A tagged release is created, marking a clean and stable project milestone.
      ajalisatgi_7-1746572100672.png

    • Maintain Sync: A PR is generated to merge the main branch back into the develop branch only if the release branch has changes that are not present in develop.

 

Hotfix Release Process

ajalisatgi_3-1746637316882.png

In the fast-paced world of software engineering, efficient handling of critical issues is paramount. The hotfix release process empowers developers and DevOps engineers alike to swiftly address urgent fixes, minimizing disruption and downtime. A hotfix release should be used when an urgent fix is required for a critical bug or security flaw that exists in the production (live) environment. This is distinct from the standard development process, which typically handles new features, improvements, and non-urgent bug fixes through the develop branch and regular release cycles. Here's a structured, step-by-step breakdown of the process:

Step 1: Developer initiates hotfix

  • The developer identifies a critical issue and creates a hotfix branch from the main branch.
  • They implement the required fix and open a Pull Request (PR) against the main branch for approvals.

ajalisatgi_2-1746573217168.png

Step 2: Manual trigger by DevOps engineer

  • A DevOps engineer manually triggers the ‘Draft new hotfix’ GitHub Actions workflow by referencing the original PR number created by the developer.

ajalisatgi_3-1746573278818.gif

  • A GitHub Actions workflow is triggered for ‘Draft new hotfix’:

ajalisatgi_4-1746573313796.gif

Step 3: GitHub Actions workflow – Drafting new hotfix

  • GitHub Actions retrieves the current release version.
  • Automatically increments the patch version.
  • Creates a new versioned hotfix branch based on the developer’s original fix.
  • Generates a Pull Request to merge this versioned hotfix branch into the main branch.

ajalisatgi_5-1746573352365.png

  • Closes the original developer-created hotfix PR.

ajalisatgi_6-1746573400666.png

Step 4: DevOps Approval and Merge

  • The DevOps engineer obtains approval from reviewers and merges the new versioned hotfix PR into the main branch.

Step 5: Automated GitHub Actions Workflow – Deployment and Release

  • The merge into the main branch triggers the ‘Publish new release’ GitHub Actions workflow.ajalisatgi_7-1746573462048.gif
  • The hotfix is automatically deployed to the production Databricks workspace.
  • A tagged release is created for easy tracking.

ajalisatgi_8-1746573533509.png

  • A Pull Request is automatically generated to merge changes back into the develop branch

ajalisatgi_9-1746573592610.png

 

Conclusion

Integrating GitFlow with Databricks Asset Bundles (DABs) and GitHub Actions offers a structured and repeatable approach to managing data workflows, especially for teams transitioning from ad-hoc development practices or inconsistent deployment processes. While GitFlow is not the only Git branching strategy available, it introduces a level of discipline that can be particularly useful in environments where stability, release isolation, and clear promotion paths are critical.

Compared to more freeform approaches such as manually managing branches without a defined release process,  GitFlow provides several key advantages:

  • Clear structure: It defines how features, releases, and hotfixes should be handled, helping teams avoid accidental merges or last-minute cherry-picks.

  • Environment alignment: With DABs, teams can consistently package and promote code across dev, QA, and prod, reducing the drift between environments.

  • Release control: By using release and hotfix branches, teams can test and stabilize changes in isolation before merging to main.

  • Automation-friendly: GitHub Actions makes it easy to automate tests, validations, and deployments tied to specific branch events.

That said, GitFlow may introduce overhead in fast-moving teams or those practicing trunk-based development, where short-lived branches and rapid integration are prioritized. But if your team needs a clear separation between development and release efforts—or a reliable process for managing emergency fixes—GitFlow provides a strong foundation.

For data teams working in Databricks, adopting GitFlow alongside DABs and CI/CD workflows can bring much-needed clarity, traceability, and confidence to your deployment process. This is especially true if you’re moving away from an informal or inconsistent setup.