I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons.
df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')
I also have the following spark configuration in DLT settings:
"spark.sql.files.ignoreMissingFiles": "true",
"spark.databricks.delta.schema.autoMerge.enabled": "true"
But it throws this error when I try to refresh the pipeline. It also fails when I do full refresh.
terminated with exception: Detected schema change:
Please try restarting the query. If this issue repeats across query restarts without
making progress, you have made an incompatible schema change and need to start your
query from scratch using a new checkpoint directory.
When I try to readStream with the schema,
schema = spark.read('db_name.table_name').schema
df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')
it throws the following error
pyspark.sql.utils.AnalysisException: User specified schema not supported with `table`