Databricks Community

ThiagoRosetti · ‎05-13-2026

Hi everyone,

I'm facing two specific issues in my Databricks Premium workspace (AWS - sa-east-1).

Serverless Connectivity Issue: When using Serverless compute, I can successfully call APIs ending in .com, but calls to .com.br domains fail with connection/DNS errors. The exact same code works fine when running on a Classic Cluster.

VPC Setup: Custom VPC with Unity Catalog enabled.
Security Groups: Outbound rules are open for port 443 (0.0.0.0/0).
Symptom: It feels like a DNS resolution or Egress filtering issue specific to Serverless.

Classic Cluster Spark Hang: On the other hand, when I switch to a Classic Cluster to bypass the connectivity issue, any Spark command (e.g., spark.read or simple transformations) hangs indefinitely without starting the job.

Has anyone experienced this specific behavior where Serverless ignores certain TLDs or where Spark fails to initialize on Classic Clusters in the same VPC?

Thanks in advance!

(pt-br)

Olá pessoal,

Estou enfrentando dois problemas distintos no meu workspace Premium (AWS - região sa-east-1):

Conectividade no Serverless: Não consigo consumir APIs que terminam em .com.br usando Serverless compute. Se a API for .com, funciona normalmente. O mesmo código funciona em um Cluster Clássico, o que sugere que o Serverless está lidando com o DNS ou com a saída de rede de forma diferente.

Já verifiquei os Security Groups e a porta 443 está aberta para 0.0.0.0/0.

Spark "carregando infinitamente" no Cluster: Para contornar o problema acima, tentei usar um Cluster comum. O código de requisição API funciona, mas qualquer comando Spark (como ler um dataframe ou um simples count) fica processando infinitamente e não inicia o job.

Alguém já passou por algo parecido ou sabe se existe alguma configuração de VPC/Unity Catalog que possa estar causando esse conflito entre o tipo de computação e a resolução de domínios?

Obrigado!

GaneshI · a month ago

Hi there,

Great breakdown of the symptoms — these are actually two distinct issues likely sharing a common root cause in your VPC/network configuration. Let me address both:

Issue 1: Serverless Compute — .com.br DNS Resolution Failure

Root Cause

Serverless compute in Databricks does NOT run inside your custom VPC. It runs in a Databricks-managed network and egresses through Databricks' own infrastructure. This means:

Your VPC's outbound Security Group rules (0.0.0.0/0 on port 443) do not apply to Serverless
Serverless traffic goes through Databricks-controlled egress, which may have its own DNS resolvers and egress filtering
.com.br TLD resolution can fail if the managed DNS used by Serverless doesn't properly resolve country-code TLDs (ccTLDs) or if those domains are not on Databricks' egress allowlist

Fix for Serverless Connectivity

Option 1 — Use Serverless Network Policies (Recommended) Databricks introduced Serverless Network Policies to control egress from Serverless compute. You need to explicitly allow the .com.br destinations:

Go to Account Console → Network → Serverless Network Policies
Add an egress policy that explicitly allows the target .com.br domains/IPs
This is the correct and supported way to control Serverless egress — Security Groups alone won't work

Option 2 — Contact Databricks Support If the .com.br domains are being blocked at the Databricks-managed egress layer (not your VPC), you'll need Support to confirm whether those ccTLDs are filtered and to whitelist them at the platform level for your workspace in sa-east-1.

Option 3 — Verify DNS explicitly In a Serverless notebook, run:

python

import sockettry:
    print(socket.getaddrinfo("yourtarget.com.br", 443))
except Exception as e:
    print(f"DNS failed: {e}")

This confirms whether it's a DNS resolution failure vs. a TCP/TLS connection block — important distinction for Support.

Issue 2: Classic Cluster — Spark Hanging Indefinitely

Root Cause

Classic clusters do run inside your VPC, so this is almost certainly a VPC networking/configuration problem. A Spark job hanging without starting (not failing — just hanging) typically points to:

Cause Explanation

Driver ↔ Executor communication blocked	Security Groups may block internal cluster traffic on required ports
S3 / Metastore connectivity issue	Unity Catalog metastore or S3 access is blocked, causing Spark context init to stall
Missing VPC Endpoints	Required AWS endpoints (S3, STS, KMS) may be missing, causing timeouts
DNS resolution failure inside VPC	Custom VPC may have DNS hostnames/resolution not enabled

Fix for Classic Cluster Spark Hang

Step 1 — Check VPC DNS Settings (Most Common Fix)

In AWS Console → Your VPC → Actions:

Enable DNS hostnames → must be Yes
Enable DNS resolution → must be Yes

If either is disabled, Spark nodes can't resolve each other or AWS service endpoints — causing silent hangs.

Step 2 — Verify Security Group Inbound Rules for Internal Traffic

Databricks Classic Clusters require self-referencing inbound rules in the Security Group:

Type Protocol Port Range Source

All TCP	TCP	0–65535	Same Security Group ID
All UDP	UDP	0–65535	Same Security Group ID

Without this, Driver and Executor nodes can't communicate — Spark will silently hang.

Step 3 — Verify Required VPC Endpoints Exist

For Unity Catalog + AWS in a custom VPC, these endpoints are strongly recommended:

com.amazonaws.sa-east-1.s3 (Gateway type)
com.amazonaws.sa-east-1.sts
com.amazonaws.sa-east-1.kinesis-streams (if using streaming)

Missing S3 or STS endpoints in a private subnet will cause Spark to stall during initialization.

Step 4 — Check Cluster Event Logs In the Databricks UI → Cluster → Event Log tab, look for timeout or unreachable host errors that may not surface in the notebook itself.

Likely Common Root Cause

Both issues in a custom VPC in sa-east-1 point to an incomplete network configuration:

Serverless Issue  → VPC rules don't apply; need Serverless Network Policy for .com.brClassic Hang      → Missing self-referencing SG rules OR DNS not enabled in VPC

Recommended action order: