"Detected schema change" error while reading from delta table in streaming after applying "ALTER COLUMN DROP NOT NULL" to more than one columns.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-30-2022 03:29 AM
Hi!
I have a delta table and a process that reading a stream from this table.
I need to drop the NOT NULL constraint from some of the columns of this table.
The first drop command does not affect the reading stream.
But the second command results in error:
StreamingQueryException: Detected schema change:
old schema: root
-- a: string (nullable = true)
-- b: string (nullable = true)
-- c: timestamp (nullable = false)
new schema: root
-- a: string (nullable = true)
-- b: string (nullable = false)
-- c: timestamp (nullable = false)
Full python notebook:
# Databricks notebook source
# MAGIC %sql
# MAGIC CREATE SCHEMA temp_schema
# COMMAND ----------
# MAGIC %sql
# MAGIC CREATE TABLE temp_schema.temp_stream_table (
# MAGIC a STRING NOT NULL,
# MAGIC b STRING NOT NULL,
# MAGIC c TIMESTAMP NOT NULL
# MAGIC )
# MAGIC LOCATION "dbfs:/tmp/temp_stream_table"
# COMMAND ----------
# MAGIC %sql
# MAGIC INSERT INTO temp_schema.temp_stream_table
# MAGIC VALUES ("a", "b", CURRENT_DATE), ("c", "d", CURRENT_DATE)
# COMMAND ----------
def run_stream():
return (
spark
.readStream
.option("mergeSchema", "true")
.table("temp_schema.temp_stream_table")
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", "/dbfs/tmp/temp_stream_checkpoint")
.foreachBatch(lambda df, _id: None)
.start()
.awaitTermination()
)
# COMMAND ----------
run_stream()
# COMMAND ----------
# MAGIC %sql
# MAGIC ALTER TABLE temp_schema.temp_stream_table
# MAGIC ALTER COLUMN a DROP NOT NULL;
# COMMAND ----------
run_stream()
# Success!
# COMMAND ----------
# MAGIC %sql
# MAGIC ALTER TABLE temp_schema.temp_stream_table
# MAGIC ALTER COLUMN b DROP NOT NULL;
# COMMAND ----------
run_stream()
### Failure here: StreamingQueryException: Detected schema change ...
# COMMAND ----------
# MAGIC %sql
# MAGIC DROP TABLE temp_schema.temp_stream_table;
# MAGIC DROP SCHEMA temp_schema
# COMMAND ----------
dbutils.fs.rm("dbfs:/tmp/temp_stream_checkpoint", True)
dbutils.fs.rm("dbfs:/tmp/temp_stream_table", True)
Is it a bug?
- Labels:
-
Delta table
-
Schema Change
-
Table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2022 06:07 AM
Hi @Kaniz Fatma ,
I cleanup the checkpoint directory in rows 71-12, it should be empty.
Also no new data is inserted into table, the error is on read.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2022 06:13 AM
Yes, for field "b". nullable=true vs nullable=false
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2022 06:45 AM
DBR 10.4 LTS.
Also tested on 11.3 LTS - same result.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-31-2022 07:04 AM
@Kaniz Fatma The command used to drop constraint is:
ALTER TABLE temp_schema.temp_stream_table
ALTER COLUMN a DROP NOT NULL;
This means, I want to make the column nullable, not to drop it.
The error appears on readStream step, not on drop constraint.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-08-2023 09:59 PM
Hi @Anatoly Tikhonov
Hope everything is going great.
Does @Kaniz Fatma response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?
We'd love to hear from you.
Thanks!