- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2024 10:03 PM - edited 06-14-2024 10:05 PM
I recommend avoiding fetching from a Git provider directly to run code during workflows and instead have a task that updates a Git folder within your workspace during the job (article with more details below). That way you can use Databricks to manage permissions to users and service principals and achieve granular isolation that is all within the platform and is easily traceable. For enterprises with on-prem Git data, this also avoids jobs failing due to the Git proxy server being down. The resource below sets up a great solution but one that is simpler is just having the job update the Git folder with retries whenever it is scheduled.
CI/CD techniques with Git and Databricks Git folders (Repos) | Databricks on AWS
Update: Forgot to add originally, but these are the non-PAT auth solutions Configure Git credentials & connect a remote repo to Databricks | Databricks on AWS