cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Hard reset programatically

camilo_s
Contributor

Is it possible to trigger a git reset --hard programatically?

I'm running a platform service where, as part of CI/CD, repos get deployed into the Databricks workspace. Normally, our developers work with upstream repos both from their local IDEs and from the Databricks Code Editor. It sometimes happens though that, due to their mixed (and sometimes inadequate) use of IDE and Databricks GUI, the repo deployment process fails during CICD (error code GIT_CONFLICT).

In that case, we'd like to catch this error and do a hard reset to overwrite the Databricks Repo with the status of the Upstream repo (which should be the single source of truth anyhow).

Is it possible to do this programmatically?

2 ACCEPTED SOLUTIONS

Accepted Solutions

nicole_lu_PM
Databricks Employee
Databricks Employee

Hi from the Git folders/Repos PM:

We don't currently have a solution to hard reset programmatically. This is on our roadmap.

If you orchestrate production jobs using the Databricks workflows, there is an option to use version-controlled source code in a Databricks job. 

This should eliminate the possibility of executing uncommitted changes. Is this an acceptable solution for you? 

View solution in original post

Hi Gubbanoa,

Thank you for the suggestion. Can you try deleting the Git folder and recloning it in the workspace in the automation script instead? Let me know if this does not work for your workflow and would love to brainstorm on an alternative solution or advocate for this feature.

View solution in original post

6 REPLIES 6

Hi @Retired_mod, thanks for your reply.

My question was about programmatically doing what Databricks does on the Databricks Repository when I click on the Reset (hard) option in the UI; apologies for not emphasizing this clearly enough.

I tracked the network activity in the browser when clicking on that button and noticed it calls an endpoint https://<WORKSPACE_URL>/graphql/projectGitReset_ProjectGitModal which unfortunately doesn't seem to be public (it's not documented anywhere).

I realize git reset --hard can be a destructive operation, but in our use-case remote repositories should be the single source of truth at any time, which is why it'd make sense to be able to programmatically hard reset a repo to a given branch when performing a repo update. Without this, an unstaged change in a Git repository will block Update a repo endpoint operations, which is what we'd like to avoid.

Screenshot 2024-05-06 at 15.04.46.png

โ€ƒ

 

nicole_lu_PM
Databricks Employee
Databricks Employee

Hi from the Git folders/Repos PM:

We don't currently have a solution to hard reset programmatically. This is on our roadmap.

If you orchestrate production jobs using the Databricks workflows, there is an option to use version-controlled source code in a Databricks job. 

This should eliminate the possibility of executing uncommitted changes. Is this an acceptable solution for you? 

Hi @nicole_lu_PM,

We do use version-controlled source code in our Databricks jobs for some use-cases, but it too has some major shortcomings regarding PAT management and lifecycle for setting up Git credentials. I've documented them here and truly wish the engineering ergonomics improves there.

Not sure if we have that option: We are orchestrating jobs from azure-data-factory. Is there a way to check out a particular branch for a datafactory job from azure-data-factory? 

If there is not, we still need the "databricks repo update --branch main --force" in our devops deploy script

Hi Gubbanoa,

Thank you for the suggestion. Can you try deleting the Git folder and recloning it in the workspace in the automation script instead? Let me know if this does not work for your workflow and would love to brainstorm on an alternative solution or advocate for this feature.

nicole_lu_PM
Databricks Employee
Databricks Employee

Thank you for the feedback there! We recently added more docs for SP OAuth support for DevOps. SP OAuth support for Github is being discussed. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group