DLT Spark readstream fails on the source table which is overwritten
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2023 09:09 PM
I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons.
df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')
I also have the following spark configuration in DLT settings:
"spark.sql.files.ignoreMissingFiles": "true",
"spark.databricks.delta.schema.autoMerge.enabled": "true"
But it throws this error when I try to refresh the pipeline. It also fails when I do full refresh.
terminated with exception: Detected schema change:
Please try restarting the query. If this issue repeats across query restarts without
making progress, you have made an incompatible schema change and need to start your
query from scratch using a new checkpoint directory.
When I try to readStream with the schema,
schema = spark.read('db_name.table_name').schema
df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')
it throws the following error
pyspark.sql.utils.AnalysisException: User specified schema not supported with `table`
- Labels:
-
DLT
-
Schema
-
Source Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2023 12:31 AM
Hi, Could you please confirm DLT and DBR versions?
Also please tag @Debayan with your next response which will notify me, Thank you!

