<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic SAS token issue for long running micro-batches in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/sas-token-issue-for-long-running-micro-batches/m-p/98093#M39604</link>
    <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm having an issue with some of our Databricks workloads. We're processing these workloads using the forEachBatch stream processing method.&amp;nbsp;Whenever we are performing a full reload on some of our datasources, we get the following error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[STREAM_FAILED] Query [id = 00000000-0000-0000-0000-000000000000, runId = 00000000-0000-0000-0000-000000000000] terminated with exception: Failed to acquire a SAS token for get-status on /checkpoints/commits/0 due to java.util.concurrent.ExecutionException: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=00000000-0000-0000-0000-000000000000 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path abfss://some-container@somestorageaccount.dfs.core.windows.net/ overlaps with other external tables or volumes. Conflicting tables/volumes: some_catalog.some_schema.some_table SQLSTATE: XXKST&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The error message is quite strange, since we don't have any overlapping tables or checkpoints. We have noticed that this only happens when the micro-batches become so large that it takes more than 1 hour to complete a single micro-batch.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could it be that the SAS token expires after 1 hour, which causes the checkpoint commit to fail?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Thu, 07 Nov 2024 15:16:51 GMT</pubDate>
    <dc:creator>deecee</dc:creator>
    <dc:date>2024-11-07T15:16:51Z</dc:date>
    <item>
      <title>SAS token issue for long running micro-batches</title>
      <link>https://community.databricks.com/t5/data-engineering/sas-token-issue-for-long-running-micro-batches/m-p/98093#M39604</link>
      <description>&lt;P&gt;Hi everyone,&lt;/P&gt;&lt;P&gt;I'm having an issue with some of our Databricks workloads. We're processing these workloads using the forEachBatch stream processing method.&amp;nbsp;Whenever we are performing a full reload on some of our datasources, we get the following error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;[STREAM_FAILED] Query [id = 00000000-0000-0000-0000-000000000000, runId = 00000000-0000-0000-0000-000000000000] terminated with exception: Failed to acquire a SAS token for get-status on /checkpoints/commits/0 due to java.util.concurrent.ExecutionException: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=00000000-0000-0000-0000-000000000000 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path abfss://some-container@somestorageaccount.dfs.core.windows.net/ overlaps with other external tables or volumes. Conflicting tables/volumes: some_catalog.some_schema.some_table SQLSTATE: XXKST&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The error message is quite strange, since we don't have any overlapping tables or checkpoints. We have noticed that this only happens when the micro-batches become so large that it takes more than 1 hour to complete a single micro-batch.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Could it be that the SAS token expires after 1 hour, which causes the checkpoint commit to fail?&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Thu, 07 Nov 2024 15:16:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sas-token-issue-for-long-running-micro-batches/m-p/98093#M39604</guid>
      <dc:creator>deecee</dc:creator>
      <dc:date>2024-11-07T15:16:51Z</dc:date>
    </item>
    <item>
      <title>Re: SAS token issue for long running micro-batches</title>
      <link>https://community.databricks.com/t5/data-engineering/sas-token-issue-for-long-running-micro-batches/m-p/99861#M40117</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/118118"&gt;@deecee&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you please confirm there are no external locations or volumes which can lead to this overlap of locations? what you actually have in "some_catalog.some_schema.some_table" and the "abfss://some-container@somestorageaccount.dfs.core.windows.net/" ?&lt;BR /&gt;Also just curious, are you saying a microbatch in your streaming application is expected to take more than an hour? Could you please clarify the use case if possible?&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Sat, 23 Nov 2024 18:23:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sas-token-issue-for-long-running-micro-batches/m-p/99861#M40117</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-11-23T18:23:31Z</dc:date>
    </item>
    <item>
      <title>Re: SAS token issue for long running micro-batches</title>
      <link>https://community.databricks.com/t5/data-engineering/sas-token-issue-for-long-running-micro-batches/m-p/99954#M40156</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;I can indeed confirm there are no overlapping locations. We eventually got a successful run by just increasing the cluster until the micro-batches stayed below 1 hour. I was really thrown off by the error message though, so was wondering if and how it is related to the micro-batch size.&lt;/P&gt;&lt;P&gt;What we are trying to do is process a table's CDF stream and merge changes into another table. In this particular case, we had to reprocess the whole table, which resulted in some micro-batches of over 40 billion records.&amp;nbsp;Looking at the Spark-UI I noticed that it is reading in a 1000 files per micro-batch, so the approach now is to leverage the &lt;FONT face="courier new,courier"&gt;maxFilesPerTrigger&lt;/FONT&gt; option to tune the micro-batch size.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Mon, 25 Nov 2024 13:06:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/sas-token-issue-for-long-running-micro-batches/m-p/99954#M40156</guid>
      <dc:creator>deecee</dc:creator>
      <dc:date>2024-11-25T13:06:05Z</dc:date>
    </item>
  </channel>
</rss>

