cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

How to use serverless clusters in DAB deployments with Unity Catalog in private network?

Charansai
New Contributor III

Hi everyone,

I’m deploying Jobs and Pipelines using Databricks Asset Bundles (DAB) in an Azure Databricks workspace configured with private networking. I’m trying to use serverless compute for some workloads, but I’m running into issues when Unity Catalog-backed storage accounts are also in private networks.

Here’s what I’ve done so far:

  • Defined the cluster in databricks.yml using the compute block with mode: serverless

  • The workspace is configured with Unity Catalog, and the backing storage accounts (like sysdlh) are also private

  • During deployment, the cluster fails to access the Unity Catalog storage paths

Questions:

  1. Where exactly should I define serverless compute in the databricks.yml? Is it under resources.jobs.job_name.compute or in a shared compute block?

  2. Are there known limitations when using serverless clusters in private network setups? Does serverless compute require public access to Unity Catalog storage accounts?

  3. Is there a recommended workaround to use serverless clusters with Unity Catalog in private environments? Should I switch to shared or single-user clusters with VNet injection instead?

  4. Any best practices for configuring DAB deployments with Unity Catalog + private networking? Especially for ensuring cluster access to storage paths during job execution.

Would appreciate any insights or examples from others who’ve tackled this setup!

1 REPLY 1

Coffee77
Contributor III

A lot of questions šŸ˜€ 

Concerning usage of serverless clusters in databricks.yml and assuming you're using those clusters in jobs, you must define them in the job definition. Take a look here: https://github.com/databricks/bundle-examples/tree/main/knowledge_base/serverless_job Notice how there is no explicit reference to "existing all-purpose" or classic "jobs compute" cluster. 

Concerning configuration to access to your private storage accounts backing Unity Catalog managed tables, you must enable your firewall on them. Otherwise, serverless clusters are not allowed to access. This is the same for jobs/notebook serverless cluster or SQL Warehouse clusters. Take a look here: https://docs.databricks.com/aws/en/security/network/serverless-network-security/serverless-firewall 

If you switch to all-purpose or jobs compute, there are pros and cons. I really like that serverless compute is very fast to start workloads but not the same for jobs compute, as it takes minutes. In my case, that delay is completely unacceptable, so using all-purpose clusters already active and/or serverless compute, depending type of job workload. Concerning pros and cons, there are a lot to talk about. I'm not going to copy/paste content from chat-gpt xDD Take a look here: https://docs.databricks.com/gcp/en/compute/choose-compute 

I would recommend you use service principals to run DAB from CI/CD pipelines or even manually via Databricks CLI while learning how to use it.


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now