11-19-2023 06:29 PM
Hi all,
I am currently running a job that will upsert a table by reading from delta change data feed from my silver table. Here is the relevent snippet of code:
rds_changes = spark.read.format("delta") \
.option("readChangeFeed", "true") \
.option("startingVersion", 0) \
.table("main.default.gold_table") \
.where(f"_commit_timestamp >= '{(datetime.now() - timedelta(hours=1)).strftime('%Y-%m-%d %H:%M:%S')}'")
Here is the error returned
com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: No file found in the directory: s3://databricks-workspace-stack-70da1-metastore-bucket/60ed403c-0a54-4f42-8b8a-73b8cea1bdc3/tables/6d4a9b3d-f88b-436e-be1b-09852f605f4c/_delta_log.
I have done the following:
Anyone has any idea why this happen when I am running the job which was working fine previously without any changes
11-20-2023 12:59 AM
Hey @Retired_mod , found the issue is due to truncated delta log. Thanks for the help man
10-22-2024 01:39 PM
What was the fix?
11-14-2024 01:42 AM
1) check the first delta feed enabled version in
DESCRIBE HISTORY `table_name`;
2) use this version instead of 0 in .option("startingVersion", x)
02-06-2025 06:19 AM
Can you please describe some more on option 2)
02-06-2025 07:41 AM
You would run DESCRIBE HISTORY `table_name`; to check which versions are available. If the delta log is truncated for some reason, you will not find a version 0. Use the oldest version you can find instead of 0. For example, if the oldest version you can find in the delta log is 10, use .option("startingVersion", 10).
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now