cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta, the specified key does not exist error

alejandrofm
Valued Contributor

Hi, I'm having this error too frequently on a few tables, I check on S3 and the partition exists and the file is there on the partition.

error: Spectrum Scan Error: DeltaManifest

code: 15005

context: Error fetching Delta Lake manifest delta/product/sub_product/_symlink_format_manifest/data_date=2022-03-04/data_hour=0/manifest Message: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid P66ZVJ3X8MNZFEJH,ExtRid b4eWb8sgxF/50

query: 84792889

location: scan_range_manager.cpp:1171

process: worker_thread [pid=9064]

On DESCRIBE HISTORY I see:

SnapshotIsolation for WRITE operations with Metrics on 0: {"numFiles": "0", "numOutputRows": "0", "numOutputBytes": "0"}

And WriteSerializable on WRITE operations with files and output rows.

Only writes on history, all data is being loaded by Databricks jobs.

Any idea of what could be happening? My solution because is little data is to delete the files that are there but it can't find and reprocess, but I'm trying to get the root cause of this issue.

Can be a correlation between the error and when I run VACUUM? (default 7 days) don't think so because is table that doesn't have read/writes for more than 30min per query. But maybe this helps!

thanks!!!

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

You can try to change isolation level like described here https://docs.databricks.com/delta/concurrency-control.html

Additionally S3 doesn't support concurrent writes. That problem is solved by AWS Commit. Theoretically AWS commit is solving that issue but you can read about it here https://docs.databricks.com/administration-guide/cloud-configurations/aws/s3-commit-service.html

and it is hard to say for me as I was using S3 from the beginning of that service but Azure Data Lake Storage is like new better world compared to using S3 for delta/data lake.

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

You can try to change isolation level like described here https://docs.databricks.com/delta/concurrency-control.html

Additionally S3 doesn't support concurrent writes. That problem is solved by AWS Commit. Theoretically AWS commit is solving that issue but you can read about it here https://docs.databricks.com/administration-guide/cloud-configurations/aws/s3-commit-service.html

and it is hard to say for me as I was using S3 from the beginning of that service but Azure Data Lake Storage is like new better world compared to using S3 for delta/data lake.

alejandrofm
Valued Contributor

Thanks @Hubert Dudekโ€‹ so I can try to set this failing tables to serializable by default, just in case, I understand by the history that this is what is currently using, but can't hurt

As seen here: https://docs.databricks.com/delta/optimizations/isolation-level.html

ALTER TABLE <table-name> SET TBLPROPERTIES ('delta.isolationLevel' = 'Serializable')

And for the commit service I never have more than 1 cluster writing to those tables and for reasuring I have this Spark setting on the jobs:

.config("spark.databricks.delta.multiClusterWrites.enabled", "false")

BTW, I never seen any serializable error on jobs, do they show on DESCRIBE HISTORY?

thanks!

alejandrofm
Valued Contributor

@Hubert Dudekโ€‹ , I'll add that sometimes, just running:

GENERATE symlink_format_manifest FOR TABLE schema.table

solves it, but, how can the symlink get broken?

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group