<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: databricks-connect serverless GRPC issue in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154031#M54058</link>
    <description>&lt;P&gt;&lt;SPAN&gt;This is a well-known class of issue with &lt;/SPAN&gt;&lt;STRONG&gt;gRPC/HTTP2 long-lived streams being killed by network intermediaries&lt;/STRONG&gt;&lt;SPAN&gt;. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the key diagnostic clue.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Root Cause: Network Intermediaries Killing HTTP/2 Streams&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Databricks Connect uses &lt;/SPAN&gt;&lt;STRONG&gt;gRPC over HTTP/2&lt;/STRONG&gt;&lt;SPAN&gt;, which maintains a long-lived streaming connection. During query execution on the server, this connection appears &lt;/SPAN&gt;&lt;STRONG&gt;idle&lt;/STRONG&gt;&lt;SPAN&gt; from the network's perspective (no data flowing client-ward). Network devices between your client and Databricks -- corporate proxies, firewalls, load balancers, WAFs, or NAT gateways -- often have &lt;/SPAN&gt;&lt;STRONG&gt;idle connection timeouts&lt;/STRONG&gt;&lt;SPAN&gt; that terminate connections they consider inactive.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The failure sequence:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;SPAN&gt; Client opens gRPC stream to Databricks serverless&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Query executes on server (takes N seconds/minutes)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; During execution, the gRPC stream is "idle" (no response data yet)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Network intermediary kills the "idle" HTTP/2 connection&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Server finishes query, tries to stream results back&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Connection is already dead -- results have nowhere to go&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Client never receives data, eventually times out and cancels&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This explains why result&lt;/SPAN&gt;&lt;I&gt;&lt;SPAN&gt;fetch&lt;/SPAN&gt;&lt;/I&gt;&lt;SPAN&gt;duration_ms = 0 -- the result delivery channel was severed before streaming could begin.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Diagnostic Steps&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Confirm the network theory&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Test from a machine with &lt;/SPAN&gt;&lt;STRONG&gt;direct internet access&lt;/STRONG&gt;&lt;SPAN&gt; (no corporate proxy/VPN):&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks.connect import DatabricksSession&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;spark = DatabricksSession.builder.remote(&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;host="https://&amp;lt;workspace&amp;gt;.cloud.databricks.com",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;token="&amp;lt;pat&amp;gt;",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cluster_id="serverless"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;).getOrCreate()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Run a query that takes 30+ seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df = spark.sql("SELECT *, sha2(cast(id as string), 256) FROM range(10000000)")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;result = df.collect()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;print(f"Got {len(result)} rows")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If this works from a clean network but fails from your corporate network, the issue is confirmed as a network intermediary.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Check for proxies&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Check if HTTP/HTTPS proxy is configured&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;echo $HTTP_PROXY $HTTPS_PROXY $http_proxy $https_proxy&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Check if a corporate proxy intercepts traffic&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;curl -v https://&amp;lt;workspace&amp;gt;.cloud.databricks.com 2&amp;gt;&amp;amp;1 | grep -i proxy&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Enable gRPC debug logging&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export SPARK_CONNECT_LOG_LEVEL=debug&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export GRPC_TRACE=all&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export GRPC_VERBOSITY=DEBUG&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Then run your query and look for connection reset, stream closed, or EOF errors in the logs.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solutions&lt;/STRONG&gt;&lt;/H3&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 1: Configure gRPC Keepalive (Most Effective)&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Force the gRPC channel to send periodic PING frames, preventing intermediaries from treating the connection as idle:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;import grpc&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks.connect import DatabricksSession&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Configure keepalive options&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;spark = DatabricksSession.builder.remote(&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;host="https://&amp;lt;workspace&amp;gt;.cloud.databricks.com",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;token="&amp;lt;pat&amp;gt;",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cluster_id="serverless"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;).header("grpc-keepalive-time-ms", "10000") \&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;.header("grpc-keepalive-timeout-ms", "5000") \&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;.getOrCreate()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If custom headers don't work for keepalive, try setting environment variables before creating the session:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;import os&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_KEEPALIVE_TIME_MS"] = "10000" &amp;nbsp; &amp;nbsp; &amp;nbsp; # Send ping every 10s&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_KEEPALIVE_TIMEOUT_MS"] = "5000"&amp;nbsp; &amp;nbsp; &amp;nbsp; # Wait 5s for pong&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS"] = "1"&amp;nbsp; # Ping even when idle&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS"] = "5000"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 2: Bypass Corporate Proxy for Databricks Traffic&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If you're behind a corporate proxy, configure a proxy bypass for your Databricks workspace:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Add to your environment&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export NO_PROXY=".cloud.databricks.com,.azuredatabricks.net"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or configure your proxy (Squid, Zscaler, etc.) to pass through HTTP/2 traffic to Databricks endpoints without terminating/re-establishing the connection.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 3: Reduce Result Set Size&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Large result sets take longer to stream, increasing the window for connection drops. Reduce what you pull to the client:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Instead of collecting all rows&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# df.collect()&amp;nbsp; # BAD -- pulls everything via gRPC&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Option A: Limit rows&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df.limit(10000).collect()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Option B: Use toPandas with Arrow (more efficient streaming)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;pdf = df.limit(10000).toPandas()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Option C: Write results to a table, then read via SQL Connector&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df.write.mode("overwrite").saveAsTable("my_catalog.my_schema.results_temp")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Then read with Databricks SQL Connector (HTTP-based, no gRPC issues)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 4: Switch to Databricks SQL Connector for Result Fetching&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Since the SQL Connector works on your network, use a &lt;/SPAN&gt;&lt;STRONG&gt;hybrid approach&lt;/STRONG&gt;&lt;SPAN&gt; -- Spark Connect for transformations, SQL Connector for result retrieval:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks.connect import DatabricksSession&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks import sql&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Use Spark Connect for computation&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;spark = DatabricksSession.builder.remote(...).getOrCreate()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df = spark.sql("SELECT ... complex transformation ...")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df.write.mode("overwrite").saveAsTable("tmp.results")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Use SQL Connector (HTTP) for result retrieval&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;with sql.connect(&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server_hostname="&amp;lt;workspace&amp;gt;.cloud.databricks.com",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;http_path="/sql/1.0/warehouses/&amp;lt;id&amp;gt;",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;access_token="&amp;lt;pat&amp;gt;"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;) as conn:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cursor = conn.cursor()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cursor.execute("SELECT * FROM tmp.results")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;results = cursor.fetchall()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 5: Increase Timeout on Network Devices&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If you control the network infrastructure, increase the idle timeout on the device killing the connection:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Device&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Setting&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Recommended Value&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;AWS ALB/NLB&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Idle timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;300-3600 seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Azure Application Gateway&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Connection idle timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;300+ seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Squid Proxy&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;connect_timeout / read_timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;3600 seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Zscaler&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;SSL inspection timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Bypass for Databricks&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Corporate Firewall&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;TCP idle timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;3600 seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 6: Use SSL Certificate Path (If TLS Issues)&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If your network uses TLS inspection (MITM proxy), the gRPC channel may fail silently:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH="/path/to/corporate-ca-bundle.crt"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or add the corporate CA to Python's certificate store:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;pip install certifi&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;cat /path/to/corporate-ca.pem &amp;gt;&amp;gt; $(python -c "import certifi; print(certifi.where())")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Why SQL Connector Works but Spark Connect Doesn't&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Feature&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Spark Connect (gRPC)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;SQL Connector (HTTP)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Protocol&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;HTTP/2 long-lived stream&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;HTTP/1.1 request/response&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Connection&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Persistent bidirectional&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Short-lived poll-based&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;During execution&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Connection appears idle&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;No connection held open&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Result delivery&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Server pushes via stream&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Client polls for results&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Proxy compatibility&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Poor (many proxies break HTTP/2)&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Excellent&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&lt;SPAN&gt;The SQL Connector's poll-based model is inherently more resilient to network intermediaries because it doesn't maintain a long-lived connection that can be killed.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;When to File a Support Ticket&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If none of the above solutions work, file a Databricks support ticket with:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Workspace ID and region&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Query IDs of failed queries (from query history)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;The result&lt;/SPAN&gt;&lt;I&gt;&lt;SPAN&gt;fetch&lt;/SPAN&gt;&lt;/I&gt;&lt;SPAN&gt;duration_ms = 0 observation&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Network topology diagram (client to Databricks path)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;gRPC debug logs (SPARK&lt;/SPAN&gt;&lt;I&gt;&lt;SPAN&gt;CONNECT&lt;/SPAN&gt;&lt;/I&gt;&lt;SPAN&gt;LOG_LEVEL=debug)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Confirmation that SQL Connector works on same network&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;This may be a platform-level issue that Databricks engineering needs to investigate, especially if the gRPC stream termination is happening within Databricks' own infrastructure rather than in your network.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;References&lt;/STRONG&gt;&lt;/H3&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-connect/advanced" target="_blank"&gt;&lt;SPAN&gt;Databricks Connect Advanced Usage&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-connect/python/troubleshooting" target="_blank"&gt;&lt;SPAN&gt;Databricks Connect Troubleshooting&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-connect/queries" target="_blank"&gt;&lt;SPAN&gt;Query Interruptions with Databricks Connect&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://grpc.io/docs/guides/keepalive/" target="_blank"&gt;&lt;SPAN&gt;gRPC Keepalive Guide&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/internal-grpc-errors-when-using-databricks-connect/td-p/112817" target="_blank"&gt;&lt;SPAN&gt;Internal gRPC Errors -- Databricks Community&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Fri, 10 Apr 2026 05:03:50 GMT</pubDate>
    <dc:creator>anuj_lathi</dc:creator>
    <dc:date>2026-04-10T05:03:50Z</dc:date>
    <item>
      <title>databricks-connect serverless GRPC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154016#M54053</link>
      <description>&lt;P&gt;Queries executed via Databricks Connect v17 (Spark Connect / gRPC) on&lt;BR /&gt;serverless compute COMPLETE SUCCESSFULLY on the server side (Spark tasks&lt;BR /&gt;finish, results are produced), but the Spark Connect gRPC channel FAILS&lt;BR /&gt;TO DELIVER results back to the client application. The client receives&lt;BR /&gt;nothing, waits, and eventually cancels the query after its timeout.&lt;/P&gt;&lt;P&gt;This issue is 100% exclusive to Spark Connect. The Databricks SQL&lt;BR /&gt;Connector (poll-based HTTP) on the same data, same network, same user&lt;BR /&gt;has ZERO cancellations.&lt;/P&gt;&lt;P&gt;ENVIRONMENT:&lt;BR /&gt;------------&lt;BR /&gt;• databricks-connect version: 17 (latest)&lt;BR /&gt;• Client: External Python application via Databricks Connect&lt;BR /&gt;• Compute: Serverless (SERVERLESS_COMPUTE)&lt;BR /&gt;• Protocol: SPARK_CONNECT (gRPC / HTTP2)&lt;/P&gt;&lt;P&gt;EXACT FAILURE FLOW:&lt;BR /&gt;-------------------&lt;BR /&gt;1. Client app sends query via Databricks Connect (gRPC) → serverless&lt;BR /&gt;2. Serverless executes query — Spark tasks complete, results produced&lt;BR /&gt;3. *** Server FAILS to stream results back via gRPC ***&lt;BR /&gt;(result_fetch_duration_ms = 0 — result delivery never starts)&lt;BR /&gt;4. Client waits... receives nothing... hits app timeout&lt;BR /&gt;5. Client cancels query/session&lt;BR /&gt;6. Query recorded as CANCELED in query history&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 04:25:09 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154016#M54053</guid>
      <dc:creator>subray</dc:creator>
      <dc:date>2026-04-10T04:25:09Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect serverless GRPC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154022#M54054</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/226547"&gt;@subray&lt;/a&gt;&amp;nbsp;Have you tried limiting the data to see if works?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 04:36:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154022#M54054</guid>
      <dc:creator>Sumit_7</dc:creator>
      <dc:date>2026-04-10T04:36:26Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect serverless GRPC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154024#M54055</link>
      <description>&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="subray_0-1775795972722.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/25873i35AAD5D13D262030/image-size/medium?v=v2&amp;amp;px=400" role="button" title="subray_0-1775795972722.png" alt="subray_0-1775795972722.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;yes&amp;nbsp; i can see query completes at dat bricks side result are generated but not returned&lt;/P&gt;</description>
      <pubDate>Fri, 10 Apr 2026 04:40:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154024#M54055</guid>
      <dc:creator>subray</dc:creator>
      <dc:date>2026-04-10T04:40:29Z</dc:date>
    </item>
    <item>
      <title>Re: databricks-connect serverless GRPC issue</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154031#M54058</link>
      <description>&lt;P&gt;&lt;SPAN&gt;This is a well-known class of issue with &lt;/SPAN&gt;&lt;STRONG&gt;gRPC/HTTP2 long-lived streams being killed by network intermediaries&lt;/STRONG&gt;&lt;SPAN&gt;. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the key diagnostic clue.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Root Cause: Network Intermediaries Killing HTTP/2 Streams&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Databricks Connect uses &lt;/SPAN&gt;&lt;STRONG&gt;gRPC over HTTP/2&lt;/STRONG&gt;&lt;SPAN&gt;, which maintains a long-lived streaming connection. During query execution on the server, this connection appears &lt;/SPAN&gt;&lt;STRONG&gt;idle&lt;/STRONG&gt;&lt;SPAN&gt; from the network's perspective (no data flowing client-ward). Network devices between your client and Databricks -- corporate proxies, firewalls, load balancers, WAFs, or NAT gateways -- often have &lt;/SPAN&gt;&lt;STRONG&gt;idle connection timeouts&lt;/STRONG&gt;&lt;SPAN&gt; that terminate connections they consider inactive.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;The failure sequence:&lt;/STRONG&gt;&lt;/P&gt;
&lt;OL&gt;
&lt;LI&gt;&lt;SPAN&gt; Client opens gRPC stream to Databricks serverless&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Query executes on server (takes N seconds/minutes)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; During execution, the gRPC stream is "idle" (no response data yet)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Network intermediary kills the "idle" HTTP/2 connection&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Server finishes query, tries to stream results back&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Connection is already dead -- results have nowhere to go&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI&gt;&lt;SPAN&gt; Client never receives data, eventually times out and cancels&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/OL&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;This explains why result&lt;/SPAN&gt;&lt;I&gt;&lt;SPAN&gt;fetch&lt;/SPAN&gt;&lt;/I&gt;&lt;SPAN&gt;duration_ms = 0 -- the result delivery channel was severed before streaming could begin.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Diagnostic Steps&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;STRONG&gt;Step 1: Confirm the network theory&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Test from a machine with &lt;/SPAN&gt;&lt;STRONG&gt;direct internet access&lt;/STRONG&gt;&lt;SPAN&gt; (no corporate proxy/VPN):&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks.connect import DatabricksSession&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;spark = DatabricksSession.builder.remote(&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;host="https://&amp;lt;workspace&amp;gt;.cloud.databricks.com",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;token="&amp;lt;pat&amp;gt;",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cluster_id="serverless"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;).getOrCreate()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Run a query that takes 30+ seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df = spark.sql("SELECT *, sha2(cast(id as string), 256) FROM range(10000000)")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;result = df.collect()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;print(f"Got {len(result)} rows")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If this works from a clean network but fails from your corporate network, the issue is confirmed as a network intermediary.&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 2: Check for proxies&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Check if HTTP/HTTPS proxy is configured&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;echo $HTTP_PROXY $HTTPS_PROXY $http_proxy $https_proxy&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Check if a corporate proxy intercepts traffic&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;curl -v https://&amp;lt;workspace&amp;gt;.cloud.databricks.com 2&amp;gt;&amp;amp;1 | grep -i proxy&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Step 3: Enable gRPC debug logging&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export SPARK_CONNECT_LOG_LEVEL=debug&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export GRPC_TRACE=all&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export GRPC_VERBOSITY=DEBUG&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Then run your query and look for connection reset, stream closed, or EOF errors in the logs.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solutions&lt;/STRONG&gt;&lt;/H3&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 1: Configure gRPC Keepalive (Most Effective)&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Force the gRPC channel to send periodic PING frames, preventing intermediaries from treating the connection as idle:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;import grpc&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks.connect import DatabricksSession&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Configure keepalive options&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;spark = DatabricksSession.builder.remote(&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;host="https://&amp;lt;workspace&amp;gt;.cloud.databricks.com",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;token="&amp;lt;pat&amp;gt;",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cluster_id="serverless"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;).header("grpc-keepalive-time-ms", "10000") \&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;.header("grpc-keepalive-timeout-ms", "5000") \&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;.getOrCreate()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;If custom headers don't work for keepalive, try setting environment variables before creating the session:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;import os&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_KEEPALIVE_TIME_MS"] = "10000" &amp;nbsp; &amp;nbsp; &amp;nbsp; # Send ping every 10s&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_KEEPALIVE_TIMEOUT_MS"] = "5000"&amp;nbsp; &amp;nbsp; &amp;nbsp; # Wait 5s for pong&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_KEEPALIVE_PERMIT_WITHOUT_CALLS"] = "1"&amp;nbsp; # Ping even when idle&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;os.environ["GRPC_HTTP2_MIN_RECV_PING_INTERVAL_WITHOUT_DATA_MS"] = "5000"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 2: Bypass Corporate Proxy for Databricks Traffic&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If you're behind a corporate proxy, configure a proxy bypass for your Databricks workspace:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Add to your environment&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export NO_PROXY=".cloud.databricks.com,.azuredatabricks.net"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or configure your proxy (Squid, Zscaler, etc.) to pass through HTTP/2 traffic to Databricks endpoints without terminating/re-establishing the connection.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 3: Reduce Result Set Size&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Large result sets take longer to stream, increasing the window for connection drops. Reduce what you pull to the client:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Instead of collecting all rows&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# df.collect()&amp;nbsp; # BAD -- pulls everything via gRPC&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Option A: Limit rows&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df.limit(10000).collect()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Option B: Use toPandas with Arrow (more efficient streaming)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;pdf = df.limit(10000).toPandas()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Option C: Write results to a table, then read via SQL Connector&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df.write.mode("overwrite").saveAsTable("my_catalog.my_schema.results_temp")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Then read with Databricks SQL Connector (HTTP-based, no gRPC issues)&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 4: Switch to Databricks SQL Connector for Result Fetching&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;Since the SQL Connector works on your network, use a &lt;/SPAN&gt;&lt;STRONG&gt;hybrid approach&lt;/STRONG&gt;&lt;SPAN&gt; -- Spark Connect for transformations, SQL Connector for result retrieval:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks.connect import DatabricksSession&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;from databricks import sql&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Use Spark Connect for computation&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;spark = DatabricksSession.builder.remote(...).getOrCreate()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df = spark.sql("SELECT ... complex transformation ...")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;df.write.mode("overwrite").saveAsTable("tmp.results")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;# Use SQL Connector (HTTP) for result retrieval&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;with sql.connect(&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;server_hostname="&amp;lt;workspace&amp;gt;.cloud.databricks.com",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;http_path="/sql/1.0/warehouses/&amp;lt;id&amp;gt;",&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;access_token="&amp;lt;pat&amp;gt;"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;) as conn:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cursor = conn.cursor()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cursor.execute("SELECT * FROM tmp.results")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;results = cursor.fetchall()&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 5: Increase Timeout on Network Devices&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If you control the network infrastructure, increase the idle timeout on the device killing the connection:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Device&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Setting&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Recommended Value&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;AWS ALB/NLB&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Idle timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;300-3600 seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Azure Application Gateway&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Connection idle timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;300+ seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Squid Proxy&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;connect_timeout / read_timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;3600 seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Zscaler&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;SSL inspection timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Bypass for Databricks&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Corporate Firewall&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;TCP idle timeout&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;3600 seconds&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;H3&gt;&lt;STRONG&gt;Solution 6: Use SSL Certificate Path (If TLS Issues)&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If your network uses TLS inspection (MITM proxy), the gRPC channel may fail silently:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;export GRPC_DEFAULT_SSL_ROOTS_FILE_PATH="/path/to/corporate-ca-bundle.crt"&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Or add the corporate CA to Python's certificate store:&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;pip install certifi&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;cat /path/to/corporate-ca.pem &amp;gt;&amp;gt; $(python -c "import certifi; print(certifi.where())")&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;Why SQL Connector Works but Spark Connect Doesn't&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;TABLE&gt;
&lt;TBODY&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Feature&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;Spark Connect (gRPC)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;STRONG&gt;SQL Connector (HTTP)&lt;/STRONG&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Protocol&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;HTTP/2 long-lived stream&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;HTTP/1.1 request/response&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Connection&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Persistent bidirectional&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Short-lived poll-based&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;During execution&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Connection appears idle&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;No connection held open&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Result delivery&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Server pushes via stream&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Client polls for results&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;TR&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Proxy compatibility&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Poor (many proxies break HTTP/2)&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;TD&gt;
&lt;P&gt;&lt;SPAN&gt;Excellent&lt;/SPAN&gt;&lt;/P&gt;
&lt;/TD&gt;
&lt;/TR&gt;
&lt;/TBODY&gt;
&lt;/TABLE&gt;
&lt;P&gt;&lt;SPAN&gt;The SQL Connector's poll-based model is inherently more resilient to network intermediaries because it doesn't maintain a long-lived connection that can be killed.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;When to File a Support Ticket&lt;/STRONG&gt;&lt;/H3&gt;
&lt;P&gt;&lt;SPAN&gt;If none of the above solutions work, file a Databricks support ticket with:&lt;/SPAN&gt;&lt;/P&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Workspace ID and region&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Query IDs of failed queries (from query history)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;The result&lt;/SPAN&gt;&lt;I&gt;&lt;SPAN&gt;fetch&lt;/SPAN&gt;&lt;/I&gt;&lt;SPAN&gt;duration_ms = 0 observation&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Network topology diagram (client to Databricks path)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;gRPC debug logs (SPARK&lt;/SPAN&gt;&lt;I&gt;&lt;SPAN&gt;CONNECT&lt;/SPAN&gt;&lt;/I&gt;&lt;SPAN&gt;LOG_LEVEL=debug)&lt;/SPAN&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;SPAN&gt;Confirmation that SQL Connector works on same network&lt;/SPAN&gt;&lt;/LI&gt;
&lt;/UL&gt;
&lt;P&gt;&lt;SPAN&gt;This may be a platform-level issue that Databricks engineering needs to investigate, especially if the gRPC stream termination is happening within Databricks' own infrastructure rather than in your network.&lt;/SPAN&gt;&lt;/P&gt;
&lt;H3&gt;&lt;STRONG&gt;References&lt;/STRONG&gt;&lt;/H3&gt;
&lt;UL&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-connect/advanced" target="_blank"&gt;&lt;SPAN&gt;Databricks Connect Advanced Usage&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-connect/python/troubleshooting" target="_blank"&gt;&lt;SPAN&gt;Databricks Connect Troubleshooting&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://docs.databricks.com/aws/en/dev-tools/databricks-connect/queries" target="_blank"&gt;&lt;SPAN&gt;Query Interruptions with Databricks Connect&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://grpc.io/docs/guides/keepalive/" target="_blank"&gt;&lt;SPAN&gt;gRPC Keepalive Guide&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;LI style="font-weight: 400;" aria-level="1"&gt;&lt;A href="https://community.databricks.com/t5/data-engineering/internal-grpc-errors-when-using-databricks-connect/td-p/112817" target="_blank"&gt;&lt;SPAN&gt;Internal gRPC Errors -- Databricks Community&lt;/SPAN&gt;&lt;/A&gt;&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 10 Apr 2026 05:03:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-connect-serverless-grpc-issue/m-p/154031#M54058</guid>
      <dc:creator>anuj_lathi</dc:creator>
      <dc:date>2026-04-10T05:03:50Z</dc:date>
    </item>
  </channel>
</rss>

