cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it possible to make use of pygit2 or GitPython packages to reference git repositories from within databricks?

tompile
New Contributor III

I am making use of repos in databricks and am trying to reference the current git branch from within the notebook session.

For example:

from pygit2 import Repository

repo = Repository('/Workspace/Repos/user@domain/repository')

The code above throws an error stating that the repository cannot be found. Similar errors are thrown with GitPython as well. It seems to me that DataBricks Repos are configured in a way that means these packages cannot recognise them.

Does anyone have any experience of this?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions

niburg123
New Contributor III

You cannot use this as far as i know, but you can put a workaround in a notebook if you are calling code from your repo via a notebook:

repo_path = "/Repos/xyz_repo_path/xyz_repo_name"

repo_path_fs = "/Workspace" + repo_path

repo_branch = "main"

def checkRepoInfo():

  nb_context= json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())

  api_url = nb_context['extraContext']['api_url']

  api_token = nb_context['extraContext']['api_token']

  db_repo_data = requests.get(f"{api_url}/api/2.0/repos", headers = {"Authorization": f"Bearer {api_token}"}).json()

  for db_repo in db_repo_data["repos"]:

    db_repo_id = db_repo["id"]

    db_repo_path = db_repo["path"]

    db_repo_branch = db_repo["branch"]

    db_repo_head_commit = db_repo["head_commit_id"]

    if db_repo["path"] == repo_path:

      print ("Git commit info: ID: {} | Path: {} | Branch: {} | Commit: {}".format(db_repo_id, db_repo_path, db_repo_branch ,db_repo_head_commit))

      assert db_repo_branch == repo_branch

     

checkRepoInfo()

View solution in original post

7 REPLIES 7

Kaniz
Community Manager
Community Manager

Hi @Thomas Pile​ , Please go through the document which explains Repos for Git integration in Databricks.

ben_406796
New Contributor III

I'm having the same issue. I couldn't see anything in the documentation that @Kaniz Fatma​ posted which answers this question either.

It looks like the `.git/` subdirectory isn't actually present at the top level of the repo in databricks, which seems strange. I don't really understand why that would be and how git works in databricks without the `.git/` subdir ...

tompile
New Contributor III

Agreed, it seems very odd. @Kaniz Fatma​, are you able to assist any further on this? Is there somewhere in the linked documentation in particular that you believe would be helpful?

Hi @Thomas Pile​,

Just a friendly follow-up. Did you were able to find a solution or you still need help? please let us know.

@Jose Gonzalez​ I cannot speak for @Thomas Pile​ but I am also struggling with this issue and have been unable to find a solution

Hi @Jose Gonzalez​. I haven't been able to find a solution yet either. Are you able to help?

niburg123
New Contributor III

You cannot use this as far as i know, but you can put a workaround in a notebook if you are calling code from your repo via a notebook:

repo_path = "/Repos/xyz_repo_path/xyz_repo_name"

repo_path_fs = "/Workspace" + repo_path

repo_branch = "main"

def checkRepoInfo():

  nb_context= json.loads(dbutils.notebook.entry_point.getDbutils().notebook().getContext().toJson())

  api_url = nb_context['extraContext']['api_url']

  api_token = nb_context['extraContext']['api_token']

  db_repo_data = requests.get(f"{api_url}/api/2.0/repos", headers = {"Authorization": f"Bearer {api_token}"}).json()

  for db_repo in db_repo_data["repos"]:

    db_repo_id = db_repo["id"]

    db_repo_path = db_repo["path"]

    db_repo_branch = db_repo["branch"]

    db_repo_head_commit = db_repo["head_commit_id"]

    if db_repo["path"] == repo_path:

      print ("Git commit info: ID: {} | Path: {} | Branch: {} | Commit: {}".format(db_repo_id, db_repo_path, db_repo_branch ,db_repo_head_commit))

      assert db_repo_branch == repo_branch

     

checkRepoInfo()

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.