cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks CI/CD Azure Devops

Stellar
New Contributor II

Hi all,

I am looking for advice on what would be the best approach when it comes to CI/CD in Databricks and repo in general. What would be the best approach; to have main branch and branch off of it or? How will changes be propagated from dev to qa and then from qa to prod? Jobs will run notebooks from git? Only dev workspace will be connected to git?

Any pointers, advice, help is more than welcomed!

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @StellarSetting up a robust CI/CD (Continuous Integration/Continuous Deployment) pipeline for Databricks involves thoughtful planning and adherence to best practices.

Let’s break down the key aspects:

  1. Development Workflow:

    • Branching Strategy: It’s advisable to follow a branching strategy. Create a main branch (also known as master or mainline) as the stable base. Developers should create feature branches for their work. Avoid directly committing to the main branch to prevent conflicts.
    • Feature Branches: Each developer works on their own feature branch. When ready, they can merge their changes into the main branch using the Repos UI.
    • Git Integration: Ensure that Git integration is set up for your Databricks workspace.
  2. Propagation of Changes:

    • Dev to QA: After development and testing in the feature branch, merge changes into the main branch. Set up a QA environment (possibly a separate Databricks workspace) where you can clone the main branch. Run QA tests on this environment.
    • QA to Prod: Once QA is successful, promote the tested code to production. You can use the same process as from dev to QA. Deploy the main branch to the production environment.
  3. Jobs and Notebooks:

    • Notebooks in Git Repos: Store your Databricks notebooks in Git repositories. This provides source control and version history.
    • Job Definitions: Define your jobs in Databricks using notebooks from Git repositories. Specify the remote Git ref (e.g., a specific notebook in the main branch of a GitHub repository) in the job definition.
  4. Workspace Connections:

    • Dev Workspace: Connect your development workspace to Git. Developers work here, create feature branches, and commit changes.
    • QA and Prod Workspaces: These workspaces do not need direct Git connections. Instead, they can clone the main branch from the Git provider when needed.
  5. Merge Conflicts:

    • Resolve Conflicts: When merging branches (e.g., from feature branches to the main branch), resolve any merge conflicts. The Repos UI provides tools for conflict resolution.
  6. Terraform Integration (Optional):

    • If you use Terraform for infrastructure as code, consider integrating it with Databricks Repos.

Remember, this is a high-level overview. Detailed implementation will depend on your specific requirements and organizational practices. For more in-depth guidance, refer to the official Databricks documentation on CI/CD techniques with Git and Databricks Repos1. Happy coding! 🚀

View solution in original post

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @StellarSetting up a robust CI/CD (Continuous Integration/Continuous Deployment) pipeline for Databricks involves thoughtful planning and adherence to best practices.

Let’s break down the key aspects:

  1. Development Workflow:

    • Branching Strategy: It’s advisable to follow a branching strategy. Create a main branch (also known as master or mainline) as the stable base. Developers should create feature branches for their work. Avoid directly committing to the main branch to prevent conflicts.
    • Feature Branches: Each developer works on their own feature branch. When ready, they can merge their changes into the main branch using the Repos UI.
    • Git Integration: Ensure that Git integration is set up for your Databricks workspace.
  2. Propagation of Changes:

    • Dev to QA: After development and testing in the feature branch, merge changes into the main branch. Set up a QA environment (possibly a separate Databricks workspace) where you can clone the main branch. Run QA tests on this environment.
    • QA to Prod: Once QA is successful, promote the tested code to production. You can use the same process as from dev to QA. Deploy the main branch to the production environment.
  3. Jobs and Notebooks:

    • Notebooks in Git Repos: Store your Databricks notebooks in Git repositories. This provides source control and version history.
    • Job Definitions: Define your jobs in Databricks using notebooks from Git repositories. Specify the remote Git ref (e.g., a specific notebook in the main branch of a GitHub repository) in the job definition.
  4. Workspace Connections:

    • Dev Workspace: Connect your development workspace to Git. Developers work here, create feature branches, and commit changes.
    • QA and Prod Workspaces: These workspaces do not need direct Git connections. Instead, they can clone the main branch from the Git provider when needed.
  5. Merge Conflicts:

    • Resolve Conflicts: When merging branches (e.g., from feature branches to the main branch), resolve any merge conflicts. The Repos UI provides tools for conflict resolution.
  6. Terraform Integration (Optional):

    • If you use Terraform for infrastructure as code, consider integrating it with Databricks Repos.

Remember, this is a high-level overview. Detailed implementation will depend on your specific requirements and organizational practices. For more in-depth guidance, refer to the official Databricks documentation on CI/CD techniques with Git and Databricks Repos1. Happy coding! 🚀

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.