<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks SQL connection becomes stale in long-running app in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158026#M54648</link>
    <description>&lt;P&gt;I’m building a Databricks App that continuously queries a SQL Warehouse roughly every 30 seconds to retrieve updated data.&lt;/P&gt;&lt;P&gt;To avoid the overhead of repeatedly opening new connections, I’m currently caching the Databricks SQL connection using lru_cache.&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;functools&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;lru_cache&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;databricks&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;sql&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;databricks&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;sql&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;client&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;Connection&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;config&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;settings&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;cfg&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;Config&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;@&lt;/SPAN&gt;&lt;SPAN class=""&gt;lru_cache&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;maxsize&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN class=""&gt;get_connection&lt;/SPAN&gt;&lt;SPAN&gt;() -&amp;gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;Connection&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;"""Return a cached Databricks SQL connection using the configured warehouse."""&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN class=""&gt;sql&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;connect(&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;server_hostname&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;cfg&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;host,&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;http_path&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;settings&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;sql_warehouse_http_path,&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;credentials_provider&lt;/SPAN&gt;&lt;SPAN class=""&gt;=lambda&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN class=""&gt;cfg&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;authenticate,&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;use_cloud_fetch&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;    )&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN class=""&gt;execute_query&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;query&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN class=""&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN class=""&gt;params&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN class=""&gt;dict&lt;/SPAN&gt; &lt;SPAN class=""&gt;|&lt;/SPAN&gt; &lt;SPAN class=""&gt;list&lt;/SPAN&gt; &lt;SPAN class=""&gt;|&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt;&lt;SPAN&gt;) -&amp;gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;None&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;"""Execute a SQL statement that returns no results (DDL / DML)."""&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;conn&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;get_connection&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;with&lt;/SPAN&gt; &lt;SPAN class=""&gt;conn&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;cursor() &lt;/SPAN&gt;&lt;SPAN class=""&gt;as&lt;/SPAN&gt; &lt;SPAN class=""&gt;cursor&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;cursor&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;execute(&lt;/SPAN&gt;&lt;SPAN class=""&gt;query&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN class=""&gt;parameters&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;params&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;HR /&gt;&lt;H3&gt;The issue&lt;/H3&gt;&lt;P&gt;After running fine for a while (roughly ~10 hours, not exact), the app starts failing with this error:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;PRE&gt;&lt;SPAN&gt;2026-06-01 06:10:27,780 [INFO] databricks.sql.thrift_backend - Error during request to server:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{"method": "ExecuteStatement",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "session-id": "...",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "http-code": 200,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "error-message": "",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "original-exception": "ExecuteStatement command can only be retried for codes 429 and 503",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "no-retry-reason": "non-retryable error",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "attempt": "1/30"}&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Once this happens, the app stops working reliably until I redeploy it. After redeploying, everything works again.&lt;/P&gt;&lt;HR /&gt;&lt;H3&gt;My assumption&lt;/H3&gt;&lt;P&gt;Because the issue only appears after long runtime, I strongly suspect that the cached connection/session eventually becomes stale or invalid.&lt;/P&gt;&lt;P&gt;Given that I am reusing the same connection object for all queries, this seems like the most likely explanation.&lt;/P&gt;&lt;HR /&gt;&lt;H3&gt;Questions&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;Is it expected that a long-lived cached Databricks SQL connection eventually becomes invalid?&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;What is the recommended approach for this kind of long-running polling app?&lt;UL&gt;&lt;LI&gt;periodic reconnection (e.g. every 1 hour)?&lt;/LI&gt;&lt;LI&gt;retry + reconnect on failure?&lt;/LI&gt;&lt;LI&gt;or something like connection pooling provided by Databricks (if such exists, could not find it myself)?&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Does Databricks have any built-in mechanism to handle stale sessions automatically for long-lived connections?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Thanks in advance! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Mon, 01 Jun 2026 06:50:14 GMT</pubDate>
    <dc:creator>mnissen1337</dc:creator>
    <dc:date>2026-06-01T06:50:14Z</dc:date>
    <item>
      <title>Databricks SQL connection becomes stale in long-running app</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158026#M54648</link>
      <description>&lt;P&gt;I’m building a Databricks App that continuously queries a SQL Warehouse roughly every 30 seconds to retrieve updated data.&lt;/P&gt;&lt;P&gt;To avoid the overhead of repeatedly opening new connections, I’m currently caching the Databricks SQL connection using lru_cache.&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;functools&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;lru_cache&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;databricks&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;sql&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;databricks&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;sql&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;client&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;Connection&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;from&lt;/SPAN&gt; &lt;SPAN class=""&gt;config&lt;/SPAN&gt; &lt;SPAN class=""&gt;import&lt;/SPAN&gt; &lt;SPAN class=""&gt;settings&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;cfg&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;Config&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;@&lt;/SPAN&gt;&lt;SPAN class=""&gt;lru_cache&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;maxsize&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN class=""&gt;get_connection&lt;/SPAN&gt;&lt;SPAN&gt;() -&amp;gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;Connection&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;"""Return a cached Databricks SQL connection using the configured warehouse."""&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;return&lt;/SPAN&gt; &lt;SPAN class=""&gt;sql&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;connect(&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;server_hostname&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;cfg&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;host,&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;http_path&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;settings&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;sql_warehouse_http_path,&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;credentials_provider&lt;/SPAN&gt;&lt;SPAN class=""&gt;=lambda&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN class=""&gt;cfg&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;authenticate,&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;use_cloud_fetch&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;False&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;    )&lt;/SPAN&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;SPAN class=""&gt;def&lt;/SPAN&gt; &lt;SPAN class=""&gt;execute_query&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;query&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN class=""&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN class=""&gt;params&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN class=""&gt;dict&lt;/SPAN&gt; &lt;SPAN class=""&gt;|&lt;/SPAN&gt; &lt;SPAN class=""&gt;list&lt;/SPAN&gt; &lt;SPAN class=""&gt;|&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;None&lt;/SPAN&gt;&lt;SPAN&gt;) -&amp;gt; &lt;/SPAN&gt;&lt;SPAN class=""&gt;None&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;"""Execute a SQL statement that returns no results (DDL / DML)."""&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;conn&lt;/SPAN&gt; &lt;SPAN class=""&gt;=&lt;/SPAN&gt; &lt;SPAN class=""&gt;get_connection&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;BR /&gt;    &lt;SPAN class=""&gt;with&lt;/SPAN&gt; &lt;SPAN class=""&gt;conn&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;cursor() &lt;/SPAN&gt;&lt;SPAN class=""&gt;as&lt;/SPAN&gt; &lt;SPAN class=""&gt;cursor&lt;/SPAN&gt;&lt;SPAN&gt;:&lt;/SPAN&gt;&lt;BR /&gt;        &lt;SPAN class=""&gt;cursor&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;execute(&lt;/SPAN&gt;&lt;SPAN class=""&gt;query&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN class=""&gt;parameters&lt;/SPAN&gt;&lt;SPAN class=""&gt;=&lt;/SPAN&gt;&lt;SPAN class=""&gt;params&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;HR /&gt;&lt;H3&gt;The issue&lt;/H3&gt;&lt;P&gt;After running fine for a while (roughly ~10 hours, not exact), the app starts failing with this error:&lt;/P&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;DIV class=""&gt;&lt;PRE&gt;&lt;SPAN&gt;2026-06-01 06:10:27,780 [INFO] databricks.sql.thrift_backend - Error during request to server:&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt;{"method": "ExecuteStatement",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "session-id": "...",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "http-code": 200,&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "error-message": "",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "original-exception": "ExecuteStatement command can only be retried for codes 429 and 503",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "no-retry-reason": "non-retryable error",&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; "attempt": "1/30"}&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;Once this happens, the app stops working reliably until I redeploy it. After redeploying, everything works again.&lt;/P&gt;&lt;HR /&gt;&lt;H3&gt;My assumption&lt;/H3&gt;&lt;P&gt;Because the issue only appears after long runtime, I strongly suspect that the cached connection/session eventually becomes stale or invalid.&lt;/P&gt;&lt;P&gt;Given that I am reusing the same connection object for all queries, this seems like the most likely explanation.&lt;/P&gt;&lt;HR /&gt;&lt;H3&gt;Questions&lt;/H3&gt;&lt;OL&gt;&lt;LI&gt;Is it expected that a long-lived cached Databricks SQL connection eventually becomes invalid?&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;What is the recommended approach for this kind of long-running polling app?&lt;UL&gt;&lt;LI&gt;periodic reconnection (e.g. every 1 hour)?&lt;/LI&gt;&lt;LI&gt;retry + reconnect on failure?&lt;/LI&gt;&lt;LI&gt;or something like connection pooling provided by Databricks (if such exists, could not find it myself)?&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;Does Databricks have any built-in mechanism to handle stale sessions automatically for long-lived connections?&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Thanks in advance! &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 06:50:14 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158026#M54648</guid>
      <dc:creator>mnissen1337</dc:creator>
      <dc:date>2026-06-01T06:50:14Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL connection becomes stale in long-running app</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158035#M54650</link>
      <description>&lt;DIV&gt;&lt;SPAN class=""&gt;You can expect Long lived cached SQL connections &lt;/SPAN&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;to become stale (due to&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;idle and session timeouts) for better resource governance (warehouse auto scaling), security and optimizations (TLS drops, backend session expiration, routing). The underlying Thrift session is invalidated. &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN class=""&gt;You can follow below&lt;/SPAN&gt;&lt;/DIV&gt;&lt;UL&gt;&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Connection lifecycle management - &lt;/STRONG&gt;You can&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;&amp;nbsp;implement a reconnect on failure wrapper or use SQLAlchemy with the databricks version&lt;/SPAN&gt;&lt;SPAN class=""&gt;. Its QueuePool&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;provides various parameters - pool_pre_ping&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;(True - validates connections before use) and pool_recycle&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN class=""&gt;(1800 - forces refresh every 30 mins) solving staleness. More details &lt;A href="https://docs.databricks.com/aws/en/dev-tools/sqlalchemy" target="_self"&gt;here&lt;/A&gt;&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Idempotent retries - &lt;/STRONG&gt;You can&amp;nbsp;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;catch &lt;/SPAN&gt;&lt;SPAN class=""&gt;session errors, discard the connection, instantiate a new one and implement retry with &lt;/SPAN&gt;&lt;SPAN class=""&gt;exponential backoff.&lt;/SPAN&gt;&lt;/LI&gt;&lt;LI&gt;&lt;SPAN class=""&gt;&lt;SPAN class=""&gt;&lt;STRONG&gt;Warehouse -&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN class=""&gt;You can disable auto stop in the warehouse if you require continuous availability.&amp;nbsp;&lt;/SPAN&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;SPAN class=""&gt;You&amp;nbsp;can plan for client side recycling as recycling is user responsibility.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 07:56:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158035#M54650</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-06-01T07:56:04Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL connection becomes stale in long-running app</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158039#M54652</link>
      <description>&lt;P&gt;When looking into the documentation for SQLAlachemy with Databricks it seems like you need to specify a PAT when creating the engine. Is it correct that it does not allow for any other authentication methods? Seems like it is just wrapping on top off the databricks sql module so cant we use m2m authentication (similarly to what i have done in my post using the credentials_provider, passing it the relevant information from cfg.authenticate?)&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 08:57:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158039#M54652</guid>
      <dc:creator>mnissen1337</dc:creator>
      <dc:date>2026-06-01T08:57:22Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks SQL connection becomes stale in long-running app</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158045#M54654</link>
      <description>&lt;P&gt;SQLAlchemy dialect is a wrapper for the native databricks sql connector. You can try to pass the various authentication configuration supported by the underlying SQL connector directly into the connect_args dictionary parameter of the alchemy engine.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;import os
from sqlalchemy import create_engine, text
from databricks.sql.auth import AuthType

# Workspace and app credentials
DATABRICKS_HOST = os.environ.get("DBX_HOST")
HTTP_PATH = os.environ.get("DBX_HTTP_PATH")
AZURE_CLIENT_ID = os.environ.get("AZURE_APP_ID")
AZURE_CLIENT_SECRET = os.environ.get("AZURE_APP_SECRET")

# Build the engine
engine = create_engine(
    f"databricks://token:not_used@{DATABRICKS_HOST}?http_path={HTTP_PATH}&amp;amp;catalog=main&amp;amp;schema=default",
    connect_args={
        "auth_type": AuthType.AZURE_SP_M2M.value,
        "azure_client_id": AZURE_CLIENT_ID,
        "azure_client_secret": AZURE_CLIENT_SECRET,
    }
)&lt;/LI-CODE&gt;&lt;P&gt;Meanwhile, you can remove the cache lru_cache until you get the new approach fixed.&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 10:41:07 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-sql-connection-becomes-stale-in-long-running-app/m-p/158045#M54654</guid>
      <dc:creator>balajij8</dc:creator>
      <dc:date>2026-06-01T10:41:07Z</dc:date>
    </item>
  </channel>
</rss>

