SAS token issue for long running micro-batches
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-07-2024 07:16 AM
Hi everyone,
I'm having an issue with some of our Databricks workloads. We're processing these workloads using the forEachBatch stream processing method. Whenever we are performing a full reload on some of our datasources, we get the following error.
[STREAM_FAILED] Query [id = 00000000-0000-0000-0000-000000000000, runId = 00000000-0000-0000-0000-000000000000] terminated with exception: Failed to acquire a SAS token for get-status on /checkpoints/commits/0 due to java.util.concurrent.ExecutionException: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=00000000-0000-0000-0000-000000000000 ErrorClass=INVALID_PARAMETER_VALUE.INVALID_PARAMETER_VALUE] Input path abfss://some-container@somestorageaccount.dfs.core.windows.net/ overlaps with other external tables or volumes. Conflicting tables/volumes: some_catalog.some_schema.some_table SQLSTATE: XXKST
The error message is quite strange, since we don't have any overlapping tables or checkpoints. We have noticed that this only happens when the micro-batches become so large that it takes more than 1 hour to complete a single micro-batch.
Could it be that the SAS token expires after 1 hour, which causes the checkpoint commit to fail?
Thanks
- Labels:
-
Delta Lake
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-23-2024 10:23 AM
Can you please confirm there are no external locations or volumes which can lead to this overlap of locations? what you actually have in "some_catalog.some_schema.some_table" and the "abfss://some-container@somestorageaccount.dfs.core.windows.net/" ?
Also just curious, are you saying a microbatch in your streaming application is expected to take more than an hour? Could you please clarify the use case if possible?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-25-2024 05:06 AM
Hi @VZLA,
I can indeed confirm there are no overlapping locations. We eventually got a successful run by just increasing the cluster until the micro-batches stayed below 1 hour. I was really thrown off by the error message though, so was wondering if and how it is related to the micro-batch size.
What we are trying to do is process a table's CDF stream and merge changes into another table. In this particular case, we had to reprocess the whole table, which resulted in some micro-batches of over 40 billion records. Looking at the Spark-UI I noticed that it is reading in a 1000 files per micro-batch, so the approach now is to leverage the maxFilesPerTrigger option to tune the micro-batch size.
Thanks

