cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Allows to serveless compute to connect to postgres db

jeremy98
Contributor III

Hi Community,


Is it possible to enable VNet peering between Databricks Serverless Compute and a private PostgreSQL database that is already configured with a VNet?

Currently, everything works fine when I create my personal cluster because I have set up the VNet peering. However, when I try to attach a notebook to a Serverless Compute instance, calls to my private database (hosted in Azure) fail, and I cannot determine the cause.

What is the procedure to enable VNet peering for Serverless Compute? How can I discover the IPs used by Serverless Compute to allow access to my database?

10 REPLIES 10

Rjdudley
Valued Contributor II

Have you tried Lakehouse Federation for this request?  That might be a better way, depending on what you're doing.

Otherwise, I think you need to use an NCC to set up the trust between serverless and your PostgreSQL, see https://learn.microsoft.com/en-us/azure/databricks/security/network/serverless-network-security/serv... (I know the URL says Private Link, it's not Private Link).

Hi,

Thanks for your answers, what is Lakehouse Federation? Considers that PostgresSQL is created inside Azure, so we need to enable essentially some VNet, I was reading your solution, I hope this will work, since It's important to migrate data from a current architecture to a new one in databricks.

Rjdudley
Valued Contributor II

jeremy98
Contributor III

Ah ok thanks, I already did it, but for us we need to migrate to databricks

Rjdudley
Valued Contributor II

Is that PostgreSQL server going to go away after you migrate to Databricks, or is it going to continue to be used?  Either way, federation works for you.  If you're going to discontinue it, just do a full extract into an archive location and a one-time ETL from that.  If it will remain in service, you can treat it as a landing location and have extraction workbooks load bronze from it.

Hi, 
Thanks for your answer, actually we don't know if it will remains but since we have a medallion architecture having data in parquet files it's better because the data size is less rather than postgres.
But, at the end of our medallion architecture we have a new postgres database where we need to store our data for our portal. So, I could say we need to have serveless to catch faster data and don't wait for job compute up.

But, if my postgres wasn't created with a private endpoint. Do I need to recreate the postgres database? Because I did with my colleague this thing with Terraform. And I got a problem about this...

Rjdudley
Valued Contributor II

@jeremy98 wrote:

But, if my postgres wasn't created with a private endpoint. Do I need to recreate the postgres database? Because I did with my colleague this thing with Terraform. And I got a problem about this...


Not necessarily, but this will depend on how your company's networking and access permissions are set up.  In my case, we are migrating from a PostgreSQL server which has an IP access list.  Before we decommission the application and database, we'll do a final archive of the data.  We have a couple options for accessing the data--remove the IP access list knowing we are going to tear down the server shortly, or add a NAT gateway with a fixed IP, and add that IP to the PostgreSQL server's allowlist, and then extracting the data.

There are other ways also, but they would be particular to your company's infrastructure.  This is the point where an internet rando says "it depends" and talking to your DevOps or networking team can figure out the exact steps.  If they have a specific recommendation maybe the forums can help with that.

As an aside, you'll need to set up a NAT Gateway before the end of September anyway, see https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/secure-cluster-connectiv....  It might be to your benefit to start there.

mmm ok, so if I don't want to use NCC since from Terraform I cannot create it because it said that I need to recreate the database with this endpoint enabled. How can I do this settings of enabling serveless compute to communicate with Postgres SQL database inside my azure account but that stands in a precise vnet?

Rjdudley
Valued Contributor II

I'm unclear what you're asking here.  The NCC wouldn't connect serverless directly to your database, the NCC connects serverless to the VNet where your workspace runs.  You'd then use Lakehouse Federation to connect the workspace to your database.  If that's not something which will work for you, I think you need to engage your account support team.  They can work with you 1:1 and help figure out exactly how to work in your environment.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group