โ09-27-2023 01:38 PM
Hi Team,
I've recently begun working with Databricks and I'm exploring options for setting up a CI/CD pipeline to pull the latest code from GitHub.
I have to pull latest code(.sql) from Github whenever push is done to main branch and update .sql notebook in Databricks. On scheduled run latest code should get executed.
I would greatly appreciate guidance on how to accomplish this. Thank you
@-werners- @hubert_dudek@daniel_sahal @Ajay-Pandey @Rishabh-Pandey @Aviral_Bhardwaj @Vivian_Wilfred @Pat @karthik_p
โ09-27-2023 02:58 PM
There are multiple alternatives for CI/CD on Databricks for deployment.
โ09-28-2023 12:24 AM - edited โ09-28-2023 12:35 AM
Hi @btafur ,
I went through the links. I found that we have to use 3rd service to pull the repo from Github to Databricks production folder like Jenkins or Github Action. For that we need to get separate license. Is it possible to pull code directly from Databricks?
Thanks for your help
โ09-28-2023 06:45 PM
Generally you would require some sort of compute that executes the automation, so that's why using a third party tool, even if it is Open Source might incur in a small additional cost. However, some of those have free tiers as well depending on the tool.
If all you need is pulling the code to Databricks, you can do it manually using Repos as mentioned in Option 1. However, any automation will require a server that runs the automation with any of the third party or Open Source tools - Jenkins, Github Actions, Terraform, etc.
โ10-04-2023 04:48 AM
FWIW:
we pull manually, but it is possible to automate that without any cost if you use Azure Devops. There is a free tier (depending on the number of pipelines/duration).
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group