cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Switching Branches using code in notebooks?

damirg
New Contributor

Hi,

Iโ€™m working on a project in a Databricks notebook and Iโ€™m trying to implement the following workflow:

  1. Create a new branch from Python code
  2. In the next cell, switch the notebook to that newly created branch

Iโ€™m able to create the branch without issues, but attempting to switch to it results in errors. Has anyone implemented something similar or found a reliable method to handle the branch switch from within a notebook?

3 REPLIES 3

pradeep_singh
Contributor

Curious . What are you trying to achive with this setup ? I can imagine u are able to create a branch because that command could be running on the remote server . But switching to a new branch needs to happen on the local server. Is this case its a a notebook cell . But why would you want to do this in the first place. 

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

saurabh18cs
Honored Contributor III

Hi @damirg same question what @pradeep_singh has but if you still want to proceed with this way then try out this:

 NOTE:: 1) Rename the attach file extension to .py
2) change SSS to your foldername within databricks workspace/Repos/<<foldername>>/<<reponame>>
 
See if you can use this to change your branch, This has more methods but use what you need
 
Pseudocode below (modify just for your usecase):
  repo_url, repo_name, branch_name = get_repo_info()
  repo_config = get_repo_config(repo_name, branch_name, repo_url)
  print(json.dumps(repo_config, indent=2))

  repo_exists = get_existing_repo(databricks_url, databricks_token, repo_name)
  print(json.dumps(repo_exists, indent=2) if repo_exists else "Repo does not exist")

  if repo_exists:
      repo_id = repo_exists.get('id')
      if not repo_id:
          print("Repo Id is null")
          exit(1)
      else:
          update_repo(databricks_url, databricks_token, repo_id, repo_config)
          print(f"Repository {repo_name} updated.")
  else:
      create_workspace_directory(databricks_workspace_url, databricks_token)
      repo_id = create_repo(databricks_url, databricks_token, repo_config)
      print(f"Repo id: {repo_id}")
      if not repo_id:
          print("Repo Id is null")
          exit(1)
      else:
          update_repo(databricks_url, databricks_token, repo_id, repo_config)
          print(f"Repository {repo_name} created.")

 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi,

Great question! Yes, you can switch Git branches programmatically in Databricks -- there are a few approaches depending on your use case.


OPTION 1: DATABRICKS PYTHON SDK (RECOMMENDED FOR NOTEBOOKS)

The simplest approach from within a notebook is using the Databricks Python SDK, which is pre-installed on Databricks clusters. You can use the ReposAPI.update() method to switch branches:

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# First, find your repo by path
repos = list(w.repos.list(path_prefix="/Repos/your_username/your_repo_name"))
repo = repos[0]

print(f"Current branch: {repo.branch}")
print(f"Repo ID: {repo.id}")

# Switch to a different branch
w.repos.update(repo_id=repo.id, branch="my-feature-branch")

print("Branch switched successfully!")

You can also switch to a specific tag instead of a branch:

w.repos.update(repo_id=repo.id, tag="v1.0.0")

Docs: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/workspace/repos.html


OPTION 2: REST API VIA REQUESTS

If you prefer using the REST API directly, you can call the PATCH /api/2.0/repos/{repo_id} endpoint:

import requests

# Get your workspace URL and token from notebook context
workspace_url = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiUrl().getOrElse(None)
token = dbutils.notebook.entry_point.getDbutils().notebook().getContext().apiToken().getOrElse(None)

headers = {
"Authorization": f"Bearer {token}",
"Content-Type": "application/json"
}

# Step 1: Find your repo ID
response = requests.get(
f"{workspace_url}/api/2.0/repos",
headers=headers,
params={"path_prefix": "/Repos/your_username/your_repo_name"}
)
repo_id = response.json()["repos"][0]["id"]

# Step 2: Switch branch
response = requests.patch(
f"{workspace_url}/api/2.0/repos/{repo_id}",
headers=headers,
json={"branch": "my-feature-branch"}
)

print(response.json())

Docs: https://docs.databricks.com/api/workspace/repos/update


OPTION 3: DATABRICKS CLI

If you are automating from outside a notebook (e.g., CI/CD pipelines):

# Switch to a branch
databricks repos update --repo-id <repo-id> --branch my-feature-branch

# Find your repo ID first
databricks repos list --path-prefix /Repos/your_username/your_repo_name

Docs: https://docs.databricks.com/dev-tools/cli/repos-cli.html


IMPORTANT CAVEATS

1. Notebook state is lost -- Switching branches that alter notebook source code will clear cell outputs, comments, version history, and widgets.

2. Uncommitted changes carry over -- If you have uncommitted changes on the current branch, they will carry over to the new branch unless they conflict with code on the target branch.

3. Don't switch while jobs are running -- If a job is executing notebooks from a Git folder and you switch branches mid-run, some notebooks may reflect the old branch while others reflect the new one.

4. Repos API not supported with CLI-enabled Git folders -- If your Git folder has "Git CLI access" enabled (beta), the Repos API is not supported. You would need to use git checkout commands directly in a web terminal instead.

5. Switching branches on a repo you're currently running in -- Be careful if the notebook calling w.repos.update() is itself inside the repo you're switching. The notebook code in memory will continue to run, but any subsequent %run calls or file reads will pull from the new branch.


BEST PRACTICE: SEPARATE REPOS PER BRANCH

The recommended Databricks workflow is for each developer to have their own Git folder mapped to the same remote repository, working in their own development branch. Rather than switching branches in a single repo, consider cloning the repo to a different path for each branch you need:

w.repos.create(
url="https://github.com/your-org/your-repo.git",
provider="gitHub",
path="/Repos/your_username/your_repo_feature_branch"
)

This avoids the state-loss issues that come with branch switching.


DOCUMENTATION REFERENCES

- Git integration for Databricks Git folders: https://docs.databricks.com/repos/
- Git operations with Git folders: https://docs.databricks.com/repos/git-operations-with-repos.html
- Repos REST API: https://docs.databricks.com/api/workspace/repos
- Databricks SDK for Python - ReposAPI: https://databricks-sdk-py.readthedocs.io/en/latest/workspace/workspace/repos.html
- Git folders limitations: https://docs.databricks.com/en/repos/limits.html

Hope this helps! Let me know if you have any follow-up questions.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.