- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 06:24 AM
Hi, I'm having this error too frequently on a few tables, I check on S3 and the partition exists and the file is there on the partition.
error: Spectrum Scan Error: DeltaManifest
code: 15005
context: Error fetching Delta Lake manifest delta/product/sub_product/_symlink_format_manifest/data_date=2022-03-04/data_hour=0/manifest Message: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid P66ZVJ3X8MNZFEJH,ExtRid b4eWb8sgxF/50
query: 84792889
location: scan_range_manager.cpp:1171
process: worker_thread [pid=9064]
On DESCRIBE HISTORY I see:
SnapshotIsolation for WRITE operations with Metrics on 0: {"numFiles": "0", "numOutputRows": "0", "numOutputBytes": "0"}
And WriteSerializable on WRITE operations with files and output rows.
Only writes on history, all data is being loaded by Databricks jobs.
Any idea of what could be happening? My solution because is little data is to delete the files that are there but it can't find and reprocess, but I'm trying to get the root cause of this issue.
Can be a correlation between the error and when I run VACUUM? (default 7 days) don't think so because is table that doesn't have read/writes for more than 30min per query. But maybe this helps!
thanks!!!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 07:50 AM
You can try to change isolation level like described here https://docs.databricks.com/delta/concurrency-control.html
Additionally S3 doesn't support concurrent writes. That problem is solved by AWS Commit. Theoretically AWS commit is solving that issue but you can read about it here https://docs.databricks.com/administration-guide/cloud-configurations/aws/s3-commit-service.html
and it is hard to say for me as I was using S3 from the beginning of that service but Azure Data Lake Storage is like new better world compared to using S3 for delta/data lake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 07:50 AM
You can try to change isolation level like described here https://docs.databricks.com/delta/concurrency-control.html
Additionally S3 doesn't support concurrent writes. That problem is solved by AWS Commit. Theoretically AWS commit is solving that issue but you can read about it here https://docs.databricks.com/administration-guide/cloud-configurations/aws/s3-commit-service.html
and it is hard to say for me as I was using S3 from the beginning of that service but Azure Data Lake Storage is like new better world compared to using S3 for delta/data lake.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2022 08:03 AM
Thanks @Hubert Dudek so I can try to set this failing tables to serializable by default, just in case, I understand by the history that this is what is currently using, but can't hurt
As seen here: https://docs.databricks.com/delta/optimizations/isolation-level.html
ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.isolationLevel' = 'Serializable')
And for the commit service I never have more than 1 cluster writing to those tables and for reasuring I have this Spark setting on the jobs:
.config("spark.databricks.delta.multiClusterWrites.enabled", "false")
BTW, I never seen any serializable error on jobs, do they show on DESCRIBE HISTORY?
thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-08-2022 05:07 AM
@Hubert Dudek , I'll add that sometimes, just running:
GENERATE symlink_format_manifest FOR TABLE schema.table
solves it, but, how can the symlink get broken?
Thanks!

