cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to use the git CLI in databricks?

Oliver_Angelil
Valued Contributor II

After making some changes in my feature branch, I have committed and pushed (to Azure Devops) some work (note I have not yet raised a PR or merge to any other branch). Many of the files I committed are data files and so I would like to reverse the commit & push using this command.

git reset --soft HEAD^ 

The git functionality in the GUI is limited - how can I use the git CLI?

Thank you.

8 REPLIES 8

Anonymous
Not applicable

@Oliver Angelil​ :

To use Git CLI in Databricks, you need to first set up Git credentials and clone your Git repository into your Databricks workspace. Here are the steps:

  1. Generate a Personal Access Token (PAT) on Azure DevOps that has the required permissions to clone the repository. You can follow the instructions in this Microsoft documentation to create a PAT: https://docs.microsoft.com/en-us/azure/devops/organizations/accounts/use-personal-access-tokens-to-a...
  2. In your Databricks workspace, click on the "User Settings" icon in the sidebar and select "Git Integrations".
  3. Click "Add" and fill out the Git integration form. Provide the Git repository URL, your Git username, and the PAT generated in step 1.
  4. Once you have successfully connected to your Git repository, you can use the Git CLI in a Databricks notebook or in a Databricks cluster by running shell commands in a cell with the %sh magic command prefix. For example, to clone your Git repository into your Databricks workspace, run the following command in a cell:
%sh
git clone <repository-url>

5) To reverse the commit and push the changes, you can run the following commands:

%sh
git reset --soft HEAD^
git commit -m "Revert changes"
git push origin <branch-name>

Note that you need to replace <repository-url> with the actual URL of your Git repository, and <branch-name> with the name of your feature branch.

Thank you very much @Suteja Kanuri​. That looks to be working well, however I have already cloned the repo in the Repos section in databricks. How can I navigate to the location of the cloned repo in the shell CLI?

For example this is the default location:

image.png 

and there is no git repo in this location:

imageThank you very much in advance

molofishy

Anonymous
Not applicable

@Oliver Angelil​ :

If you have already cloned the Git repository in the Repos section of Databricks, you can navigate to the location of the cloned repository in the shell CLI by using the Databricks File System (DBFS) path.

Here are the steps to navigate to the cloned repository location in the shell CLI:

1) Open a new notebook in Databricks and execute the following command to display the DBFS mount point:

%fs mounts

2) Look for the mount point corresponding to the storage account where the repository is cloned. For example, if the repository is cloned in Azure Blob Storage, you should see a mount point for the storage account.

3) In the shell CLI, navigate to the DBFS mount point using the following command:

cd /mnt/<mount_point_name>/

4) Navigate to the folder that corresponds to the name of the Git repository using the following command:

cd <repository_name>

5) You can now use the Git CLI commands to perform operations on the cloned repository.

Please note that the DBFS mount point name and the repository name may be different from the original repository name on the Git repository host.

Thanks @Suteja Kanuri​, that is very helpful. I see 24 mounts when running:

%fs mounts

I am not sure which one hosts the Repos in the repos section in Databricks - when cloning a Repo, there is no option to choose on which mount to clone the repo (one must only specify the URL of the remote repo). Is there a default location (e.g. into DatabricksRoot) where the clones go?

Thank you very. much

Anonymous
Not applicable

Hi @Oliver Angelil​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

@Vidula Khanna​ still waiting for a response from @Suteja Kanuri​ 

Anonymous
Not applicable

Hi @Oliver Angelil​ 

I have forwarded your request to Suteja. Soon she will be responding to you.

Have a great day ahead!

Happy Learning!

Kayla
Valued Contributor

I'm also curious about this question - does anyone have an answer? Being able to use the full repertoire of git commands inside Databricks would be quite useful.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group