Databricks Community

subray · ‎04-09-2026

Queries executed via Databricks Connect v17 (Spark Connect / gRPC) on
serverless compute COMPLETE SUCCESSFULLY on the server side (Spark tasks
finish, results are produced), but the Spark Connect gRPC channel FAILS
TO DELIVER results back to the client application. The client receives
nothing, waits, and eventually cancels the query after its timeout.

This issue is 100% exclusive to Spark Connect. The Databricks SQL
Connector (poll-based HTTP) on the same data, same network, same user
has ZERO cancellations.

ENVIRONMENT:
------------
• databricks-connect version: 17 (latest)
• Client: External Python application via Databricks Connect
• Compute: Serverless (SERVERLESS_COMPUTE)
• Protocol: SPARK_CONNECT (gRPC / HTTP2)

EXACT FAILURE FLOW:
-------------------
1. Client app sends query via Databricks Connect (gRPC) → serverless
2. Serverless executes query — Spark tasks complete, results produced
3. *** Server FAILS to stream results back via gRPC ***
(result_fetch_duration_ms = 0 — result delivery never starts)
4. Client waits... receives nothing... hits app timeout
5. Client cancels query/session
6. Query recorded as CANCELED in query history

anuj_lathi · ‎04-09-2026

This is a well-known class of issue with gRPC/HTTP2 long-lived streams being killed by network intermediaries. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the key diagnostic clue.

Root Cause: Network Intermediaries Killing HTTP/2 Streams

Databricks Connect uses gRPC over HTTP/2, which maintains a long-lived streaming connection. During query execution on the server, this connection appears idle from the network's perspective (no data flowing client-ward). Network devices between your client and Databricks -- corporate proxies, firewalls, load balancers, WAFs, or NAT gateways -- often have idle connection timeouts that terminate connections they consider inactive.

The failure sequence:

Client opens gRPC stream to Databricks serverless
Query executes on server (takes N seconds/minutes)
During execution, the gRPC stream is "idle" (no response data yet)
Network intermediary kills the "idle" HTTP/2 connection
Server finishes query, tries to stream results back
Connection is already dead -- results have nowhere to go
Client never receives data, eventually times out and cancels

This explains why resultfetchduration_ms = 0 -- the result delivery channel was severed before streaming could begin.

Diagnostic Steps

Step 1: Confirm the network theory

Test from a machine with direct internet access (no corporate proxy/VPN):

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.remote(

host="https://<workspace>.cloud.databricks.com",

token="<pat>",

cluster_id="serverless"

).getOrCreate()

# Run a query that takes 30+ seconds

df = spark.sql("SELECT *, sha2(cast(id as string), 256) FROM range(10000000)")

result = df.collect()

print(f"Got {len(result)} rows")

If this works from a clean network but fails from your corporate network, the issue is confirmed as a network intermediary.

Step 2: Check for proxies

# Check if HTTP/HTTPS proxy is configured

echo $HTTP_PROXY $HTTPS_PROXY $http_proxy $https_proxy

# Check if a corporate proxy intercepts traffic

curl -v https://<workspace>.cloud.databricks.com 2>&1 | grep -i proxy

Step 3: Enable gRPC debug logging

export SPARK_CONNECT_LOG_LEVEL=debug

export GRPC_TRACE=all

export GRPC_VERBOSITY=DEBUG

Then run your query and look for connection reset, stream closed, or EOF errors in the logs.

Solutions

Solution 1: Configure gRPC Keepalive (Most Effective)

Force the gRPC channel to send periodic PING frames, preventing intermediaries from treating the connection as idle:

import grpc

from databricks.connect import DatabricksSession

# Configure keepalive options

spark = DatabricksSession.builder.remote(

host="https://<workspace>.cloud.databricks.com",

token="<pat>",

cluster_id="serverless"

).header("grpc-keepalive-time-ms", "10000") \

.header("grpc-keepalive-timeout-ms", "5000") \

.getOrCreate()

If custom headers don't work for keepalive, try setting environment variables before creating the session:

import os

os.environ["GRPC_KEEPALIVE_TIME_MS"] = "10000" # Send ping every 10s

os.environ["GRPC_KEEPALIVE_TIMEOUT_MS"] = "5000" # Wait 5s for pong

os.environ["GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS"] = "1" # Ping even when idle

os.environ["GRPC_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS"] = "5000"

Solution 2: Bypass Corporate Proxy for Databricks Traffic

If you're behind a corporate proxy, configure a proxy bypass for your Databricks workspace:

# Add to your environment

export NO_PROXY=".cloud.databricks.com,.azuredatabricks.net"

Or configure your proxy (Squid, Zscaler, etc.) to pass through HTTP/2 traffic to Databricks endpoints without terminating/re-establishing the connection.

Solution 3: Reduce Result Set Size

Large result sets take longer to stream, increasing the window for connection drops. Reduce what you pull to the client:

# Instead of collecting all rows

# df.collect() # BAD -- pulls everything via gRPC

# Option A: Limit rows

df.limit(10000).collect()

# Option B: Use toPandas with Arrow (more efficient streaming)

pdf = df.limit(10000).toPandas()

# Option C: Write results to a table, then read via SQL Connector

df.write.mode("overwrite").saveAsTable("my_catalog.my_schema.results_temp")

# Then read with Databricks SQL Connector (HTTP-based, no gRPC issues)

Solution 4: Switch to Databricks SQL Connector for Result Fetching

Since the SQL Connector works on your network, use a hybrid approach -- Spark Connect for transformations, SQL Connector for result retrieval:

from databricks.connect import DatabricksSession

from databricks import sql

# Use Spark Connect for computation

spark = DatabricksSession.builder.remote(...).getOrCreate()

df = spark.sql("SELECT ... complex transformation ...")

df.write.mode("overwrite").saveAsTable("tmp.results")

# Use SQL Connector (HTTP) for result retrieval

with sql.connect(

server_hostname="<workspace>.cloud.databricks.com",

http_path="/sql/1.0/warehouses/<id>",

access_token="<pat>"

) as conn:

cursor = conn.cursor()

cursor.execute("SELECT * FROM tmp.results")

results = cursor.fetchall()

Solution 5: Increase Timeout on Network Devices

If you control the network infrastructure, increase the idle timeout on the device killing the connection:

Device	Setting	Recommended Value
AWS ALB/NLB	Idle timeout	300-3600 seconds
Azure Application Gateway	Connection idle timeout	300+ seconds
Squid Proxy	connect_timeout / read_timeout	3600 seconds
Zscaler	SSL inspection timeout	Bypass for Databricks
Corporate Firewall	TCP idle timeout	3600 seconds

Solution 6: Use SSL Certificate Path (If TLS Issues)

If your network uses TLS inspection (MITM proxy), the gRPC channel may fail silently:

export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH="/path/to/corporate-ca-bundle.crt"

Or add the corporate CA to Python's certificate store:

pip install certifi

cat /path/to/corporate-ca.pem >> $(python -c "import certifi; print(certifi.where())")

Why SQL Connector Works but Spark Connect Doesn't

Feature	Spark Connect (gRPC)	SQL Connector (HTTP)
Protocol	HTTP/2 long-lived stream	HTTP/1.1 request/response
Connection	Persistent bidirectional	Short-lived poll-based
During execution	Connection appears idle	No connection held open
Result delivery	Server pushes via stream	Client polls for results
Proxy compatibility	Poor (many proxies break HTTP/2)	Excellent

The SQL Connector's poll-based model is inherently more resilient to network intermediaries because it doesn't maintain a long-lived connection that can be killed.

When to File a Support Ticket

If none of the above solutions work, file a Databricks support ticket with:

Workspace ID and region
Query IDs of failed queries (from query history)
The resultfetchduration_ms = 0 observation
Network topology diagram (client to Databricks path)
gRPC debug logs (SPARKCONNECTLOG_LEVEL=debug)
Confirmation that SQL Connector works on same network

This may be a platform-level issue that Databricks engineering needs to investigate, especially if the gRPC stream termination is happening within Databricks' own infrastructure rather than in your network.

References

Anuj Lathi
Solutions Engineer @ Databricks

View solution in original post

Sumit_7 · ‎04-09-2026

@subray Have you tried limiting the data to see if works?

subray · ‎04-09-2026

yes i can see query completes at dat bricks side result are generated but not returned

anuj_lathi · ‎04-09-2026

This is a well-known class of issue with gRPC/HTTP2 long-lived streams being killed by network intermediaries. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the key diagnostic clue.

Root Cause: Network Intermediaries Killing HTTP/2 Streams

Databricks Connect uses gRPC over HTTP/2, which maintains a long-lived streaming connection. During query execution on the server, this connection appears idle from the network's perspective (no data flowing client-ward). Network devices between your client and Databricks -- corporate proxies, firewalls, load balancers, WAFs, or NAT gateways -- often have idle connection timeouts that terminate connections they consider inactive.

The failure sequence:

Client opens gRPC stream to Databricks serverless
Query executes on server (takes N seconds/minutes)
During execution, the gRPC stream is "idle" (no response data yet)
Network intermediary kills the "idle" HTTP/2 connection
Server finishes query, tries to stream results back
Connection is already dead -- results have nowhere to go
Client never receives data, eventually times out and cancels

This explains why resultfetchduration_ms = 0 -- the result delivery channel was severed before streaming could begin.

Diagnostic Steps

Step 1: Confirm the network theory

Test from a machine with direct internet access (no corporate proxy/VPN):

from databricks.connect import DatabricksSession

spark = DatabricksSession.builder.remote(

host="https://<workspace>.cloud.databricks.com",

token="<pat>",

cluster_id="serverless"

).getOrCreate()

# Run a query that takes 30+ seconds

df = spark.sql("SELECT *, sha2(cast(id as string), 256) FROM range(10000000)")

result = df.collect()

print(f"Got {len(result)} rows")

If this works from a clean network but fails from your corporate network, the issue is confirmed as a network intermediary.

Step 2: Check for proxies

# Check if HTTP/HTTPS proxy is configured

echo $HTTP_PROXY $HTTPS_PROXY $http_proxy $https_proxy

# Check if a corporate proxy intercepts traffic

curl -v https://<workspace>.cloud.databricks.com 2>&1 | grep -i proxy

Step 3: Enable gRPC debug logging

export SPARK_CONNECT_LOG_LEVEL=debug

export GRPC_TRACE=all

export GRPC_VERBOSITY=DEBUG

Then run your query and look for connection reset, stream closed, or EOF errors in the logs.

Solutions

Solution 1: Configure gRPC Keepalive (Most Effective)

Force the gRPC channel to send periodic PING frames, preventing intermediaries from treating the connection as idle:

import grpc

from databricks.connect import DatabricksSession

# Configure keepalive options

spark = DatabricksSession.builder.remote(

host="https://<workspace>.cloud.databricks.com",

token="<pat>",

cluster_id="serverless"

).header("grpc-keepalive-time-ms", "10000") \

.header("grpc-keepalive-timeout-ms", "5000") \

.getOrCreate()

If custom headers don't work for keepalive, try setting environment variables before creating the session:

import os

os.environ["GRPC_KEEPALIVE_TIME_MS"] = "10000" # Send ping every 10s

os.environ["GRPC_KEEPALIVE_TIMEOUT_MS"] = "5000" # Wait 5s for pong

os.environ["GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS"] = "1" # Ping even when idle

os.environ["GRPC_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS"] = "5000"

Solution 2: Bypass Corporate Proxy for Databricks Traffic

If you're behind a corporate proxy, configure a proxy bypass for your Databricks workspace:

# Add to your environment

export NO_PROXY=".cloud.databricks.com,.azuredatabricks.net"

Or configure your proxy (Squid, Zscaler, etc.) to pass through HTTP/2 traffic to Databricks endpoints without terminating/re-establishing the connection.

Solution 3: Reduce Result Set Size

Large result sets take longer to stream, increasing the window for connection drops. Reduce what you pull to the client:

# Instead of collecting all rows

# df.collect() # BAD -- pulls everything via gRPC

# Option A: Limit rows

df.limit(10000).collect()

# Option B: Use toPandas with Arrow (more efficient streaming)

pdf = df.limit(10000).toPandas()

# Option C: Write results to a table, then read via SQL Connector

df.write.mode("overwrite").saveAsTable("my_catalog.my_schema.results_temp")

# Then read with Databricks SQL Connector (HTTP-based, no gRPC issues)

Solution 4: Switch to Databricks SQL Connector for Result Fetching

Since the SQL Connector works on your network, use a hybrid approach -- Spark Connect for transformations, SQL Connector for result retrieval:

from databricks.connect import DatabricksSession

from databricks import sql

# Use Spark Connect for computation

spark = DatabricksSession.builder.remote(...).getOrCreate()

df = spark.sql("SELECT ... complex transformation ...")

df.write.mode("overwrite").saveAsTable("tmp.results")

# Use SQL Connector (HTTP) for result retrieval

with sql.connect(

server_hostname="<workspace>.cloud.databricks.com",

http_path="/sql/1.0/warehouses/<id>",

access_token="<pat>"

) as conn:

cursor = conn.cursor()

cursor.execute("SELECT * FROM tmp.results")

results = cursor.fetchall()

Solution 5: Increase Timeout on Network Devices

If you control the network infrastructure, increase the idle timeout on the device killing the connection:

Device	Setting	Recommended Value
AWS ALB/NLB	Idle timeout	300-3600 seconds
Azure Application Gateway	Connection idle timeout	300+ seconds
Squid Proxy	connect_timeout / read_timeout	3600 seconds
Zscaler	SSL inspection timeout	Bypass for Databricks
Corporate Firewall	TCP idle timeout	3600 seconds

Solution 6: Use SSL Certificate Path (If TLS Issues)

If your network uses TLS inspection (MITM proxy), the gRPC channel may fail silently:

export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH="/path/to/corporate-ca-bundle.crt"

Or add the corporate CA to Python's certificate store:

pip install certifi

cat /path/to/corporate-ca.pem >> $(python -c "import certifi; print(certifi.where())")

Why SQL Connector Works but Spark Connect Doesn't

Feature	Spark Connect (gRPC)	SQL Connector (HTTP)
Protocol	HTTP/2 long-lived stream	HTTP/1.1 request/response
Connection	Persistent bidirectional	Short-lived poll-based
During execution	Connection appears idle	No connection held open
Result delivery	Server pushes via stream	Client polls for results
Proxy compatibility	Poor (many proxies break HTTP/2)	Excellent

The SQL Connector's poll-based model is inherently more resilient to network intermediaries because it doesn't maintain a long-lived connection that can be killed.

When to File a Support Ticket

If none of the above solutions work, file a Databricks support ticket with:

Workspace ID and region
Query IDs of failed queries (from query history)
The resultfetchduration_ms = 0 observation
Network topology diagram (client to Databricks path)
gRPC debug logs (SPARKCONNECTLOG_LEVEL=debug)
Confirmation that SQL Connector works on same network

This may be a platform-level issue that Databricks engineering needs to investigate, especially if the gRPC stream termination is happening within Databricks' own infrastructure rather than in your network.

References

Anuj Lathi
Solutions Engineer @ Databricks

Databricks Community

databricks-connect serverless GRPC issue

Root Cause: Network Intermediaries Killing HTTP/2 Streams

Diagnostic Steps

Solutions

Solution 1: Configure gRPC Keepalive (Most Effective)

Solution 2: Bypass Corporate Proxy for Databricks Traffic

Solution 3: Reduce Result Set Size

Solution 4: Switch to Databricks SQL Connector for Result Fetching

Solution 5: Increase Timeout on Network Devices

Solution 6: Use SSL Certificate Path (If TLS Issues)

Why SQL Connector Works but Spark Connect Doesn't

When to File a Support Ticket

References

Root Cause: Network Intermediaries Killing HTTP/2 Streams

Diagnostic Steps

Solutions

Solution 1: Configure gRPC Keepalive (Most Effective)

Solution 2: Bypass Corporate Proxy for Databricks Traffic

Solution 3: Reduce Result Set Size

Solution 4: Switch to Databricks SQL Connector for Result Fetching

Solution 5: Increase Timeout on Network Devices

Solution 6: Use SSL Certificate Path (If TLS Issues)

Why SQL Connector Works but Spark Connect Doesn't

When to File a Support Ticket

References

DAIS 2026 Speaker Spotlight Series #19 | Erin Butler

Solution Accelerator Series | Large Language Models (LLMs) for Customer Service Analytics

🌟 Community Pulse: Your Weekly Roundup! June 01 – 07, 2026

FREE TRAINING: Databricks Business Impact Accelerator

FLASH SALE: Save 50% on Summit Training ⚡