Git integration inconsistencies between git folders and job git
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-28-2024 07:08 AM
It's a little confusing and limiting that the git integration support is inconsistent between the two options available.
Sparse checkout is only supported when using a workspace Git folder, and checking out by commit hash is only supported when using a remote Git source for a job. I want to check out by commit hash, and use sparse checkout!
When using a workspace Git folder to check out a branch, there is actually a risk that you are not getting the version of the code you want to deploy. Imagine a CI/CD scenario where you have merged your changes to master and are now running a pipeline to deploy your code to databricks. As part of the deploy, a workspace Git folder is updated to pull the latest commit from the master branch. While the deploy is running, another pull request is merged into master. Now you are getting different code on databricks than you intended. I want to avoid this risk by checking out by commit hash.
As a separate question, I'm curious if your git checkout uses shallow cloning (no history or full history clone).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2024 10:25 AM
Sparse Checkout: This feature is only supported when using a workspace Git folder. Sparse checkout allows you to clone and work with only a subset of the remote repository's directories, which is useful for managing large repositories.
Checking Out by Commit Hash: This feature is only supported when using a remote Git source for a job. Checking out by commit hash ensures that you are working with a specific version of the code, which is crucial for maintaining consistency, especially in CI/CD scenarios.
Unfortunately, due to the current limitations, you cannot combine sparse checkout with checking out by commit hash directly within the Databricks workspace Git folder.
To mitigate this risk, you might consider the following workaround:
- Use Remote Git Source for Jobs: Configure your jobs to use a remote Git source and specify the commit hash you want to check out. This ensures that the exact version of the code is used during deployment.
- Manual Sparse Checkout: Perform sparse checkout operations manually outside of Databricks and then push the relevant subset of the repository to a new branch or repository that Databricks can then use.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2025 08:19 AM
Thank you for confirming the information in the opening post as accurate. The workarounds are not acceptable so feel free to close this issue or you can leave it open until a more mature solution is released on the Databricks platform.

