Databricks Community

gg_047320_gg_94 · ‎05-27-2023

I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons.

df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')

I also have the following spark configuration in DLT settings:

"spark.sql.files.ignoreMissingFiles": "true",
"spark.databricks.delta.schema.autoMerge.enabled": "true"

But it throws this error when I try to refresh the pipeline. It also fails when I do full refresh.

terminated with exception: Detected schema change:

Please try restarting the query. If this issue repeats across query restarts without

making progress, you have made an incompatible schema change and need to start your

query from scratch using a new checkpoint directory.

When I try to readStream with the schema,

schema = spark.read('db_name.table_name').schema
 
df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')

it throws the following error

pyspark.sql.utils.AnalysisException: User specified schema not supported with `table`