cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Is it unusual that I need to start a compute cluster to sync with Git?

397973
New Contributor III

I would guess unusual but want to hear from others before I nag my managers about it. 

In Databricks (I access in web browser) we have a compute cluster specifically for Git; you need to start it to push code or even to change branches. This is separate from your own clusters to run pipelines.

This one cluster is available to everyone; sometimes it might be already running but usually I need to start it for any git action. It has a timeout of 60 minutes so it's usually not running.

When I've asked managers they say "oh yeah, that's how they set it up. Don't know why" (they means Databricks COE).

I have the VS Code Databricks extension setup but haven't used it much. If I use VS Code can I sidestep this whole thing? 

Does anyone else do this? It doesn't make any sense. 

1 ACCEPTED SOLUTION

Accepted Solutions

MoJaMa
Databricks Employee
Databricks Employee

It means you are on the old classic Git Proxy that helped establish connectivity from the Databricks Control Plane to your on-prem Git Server. If your Git Server was cloud-based you would not need the proxy cluster.

That being said, the new way is this: https://docs.databricks.com/aws/en/repos/serverless-private-git

The Why Use section illustrates why.

Why use Serverless Private Git?

Compared to Git server proxy, Serverless Private Git offers the following advantages:

  • Serverless Private Git acquires serverless compute only when it receives a Git request, and it can be inactive when not in use. In contrast, the Git proxy requires the proxy cluster to be active when the user submits a Git request.
  • Serverless Private Git uses PrivateLink to securely connect to the private Git instance.

View solution in original post

1 REPLY 1

MoJaMa
Databricks Employee
Databricks Employee

It means you are on the old classic Git Proxy that helped establish connectivity from the Databricks Control Plane to your on-prem Git Server. If your Git Server was cloud-based you would not need the proxy cluster.

That being said, the new way is this: https://docs.databricks.com/aws/en/repos/serverless-private-git

The Why Use section illustrates why.

Why use Serverless Private Git?

Compared to Git server proxy, Serverless Private Git offers the following advantages:

  • Serverless Private Git acquires serverless compute only when it receives a Git request, and it can be inactive when not in use. In contrast, the Git proxy requires the proxy cluster to be active when the user submits a Git request.
  • Serverless Private Git uses PrivateLink to securely connect to the private Git instance.