cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT Spark readstream fails on the source table which is overwritten

gg_047320_gg_94
New Contributor II

I am reading the source table which gets updated every day. It is usually append/merge with updates and is occasionally overwritten for other reasons.

df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')

I also have the following spark configuration in DLT settings:

"spark.sql.files.ignoreMissingFiles": "true",
"spark.databricks.delta.schema.autoMerge.enabled": "true"

But it throws this error when I try to refresh the pipeline. It also fails when I do full refresh.

terminated with exception: Detected schema change:

Please try restarting the query. If this issue repeats across query restarts without

making progress, you have made an incompatible schema change and need to start your

query from scratch using a new checkpoint directory.

When I try to readStream with the schema,

schema = spark.read('db_name.table_name').schema
 
df = spark.readStream.schema(schema).format("delta").option("ignoreChanges", True).option('startingVersion', xx).table('db_name.table_name')

it throws the following error

pyspark.sql.utils.AnalysisException: User specified schema not supported with `table`

1 REPLY 1

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi, Could you please confirm DLT and DBR versions?

Also please tag @Debayan​ with your next response which will notify me, Thank you!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.