CI/CD pipeline using Github
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2023 01:38 PM
Hi Team,
I've recently begun working with Databricks and I'm exploring options for setting up a CI/CD pipeline to pull the latest code from GitHub.
I have to pull latest code(.sql) from Github whenever push is done to main branch and update .sql notebook in Databricks. On scheduled run latest code should get executed.
I would greatly appreciate guidance on how to accomplish this. Thank you
@-werners- @hubert_dudek@daniel_sahal @Ajay-Pandey @Rishabh-Pandey @Aviral_Bhardwaj @Vivian_Wilfred @Pat @karthik_p
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-27-2023 02:58 PM
There are multiple alternatives for CI/CD on Databricks for deployment.
- Option 1: You can configure your repo directly in Databricks so you have a clone of the branch in there: https://docs.databricks.com/en/repos/ci-cd-techniques-with-repos.html
- Option 2: You can take a look at Databricks Asset Bundles for local development and automatic deployment: https://github.com/databricks/databricks-asset-bundles-dais2023
- Option 3: If you don’t need so many features that come with the asset bundles, you can just call the databricks CLI using Github actions to deploy your code and execute it. You can check more about the CLI here: https://docs.databricks.com/en/dev-tools/cli/index.html
- Other options: Alternatively you can check any other complementary tool that Databricks integrates: https://docs.databricks.com/en/dev-tools/index-ci-cd.html
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2023 12:24 AM - edited 09-28-2023 12:35 AM
Hi @btafur ,
I went through the links. I found that we have to use 3rd service to pull the repo from Github to Databricks production folder like Jenkins or Github Action. For that we need to get separate license. Is it possible to pull code directly from Databricks?
Thanks for your help
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-28-2023 06:45 PM
Generally you would require some sort of compute that executes the automation, so that's why using a third party tool, even if it is Open Source might incur in a small additional cost. However, some of those have free tiers as well depending on the tool.
If all you need is pulling the code to Databricks, you can do it manually using Repos as mentioned in Option 1. However, any automation will require a server that runs the automation with any of the third party or Open Source tools - Jenkins, Github Actions, Terraform, etc.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-04-2023 04:48 AM
FWIW:
we pull manually, but it is possible to automate that without any cost if you use Azure Devops. There is a free tier (depending on the number of pipelines/duration).

