10-30-2022 03:29 AM
Hi!
I have a delta table and a process that reading a stream from this table.
I need to drop the NOT NULL constraint from some of the columns of this table.
The first drop command does not affect the reading stream.
But the second command results in error:
StreamingQueryException: Detected schema change:
old schema: root
-- a: string (nullable = true)
-- b: string (nullable = true)
-- c: timestamp (nullable = false)
new schema: root
-- a: string (nullable = true)
-- b: string (nullable = false)
-- c: timestamp (nullable = false)
Full python notebook:
# Databricks notebook source
# MAGIC %sql
# MAGIC CREATE SCHEMA temp_schema
# COMMAND ----------
# MAGIC %sql
# MAGIC CREATE TABLE temp_schema.temp_stream_table (
# MAGIC a STRING NOT NULL,
# MAGIC b STRING NOT NULL,
# MAGIC c TIMESTAMP NOT NULL
# MAGIC )
# MAGIC LOCATION "dbfs:/tmp/temp_stream_table"
# COMMAND ----------
# MAGIC %sql
# MAGIC INSERT INTO temp_schema.temp_stream_table
# MAGIC VALUES ("a", "b", CURRENT_DATE), ("c", "d", CURRENT_DATE)
# COMMAND ----------
def run_stream():
return (
spark
.readStream
.option("mergeSchema", "true")
.table("temp_schema.temp_stream_table")
.writeStream
.trigger(availableNow=True)
.option("checkpointLocation", "/dbfs/tmp/temp_stream_checkpoint")
.foreachBatch(lambda df, _id: None)
.start()
.awaitTermination()
)
# COMMAND ----------
run_stream()
# COMMAND ----------
# MAGIC %sql
# MAGIC ALTER TABLE temp_schema.temp_stream_table
# MAGIC ALTER COLUMN a DROP NOT NULL;
# COMMAND ----------
run_stream()
# Success!
# COMMAND ----------
# MAGIC %sql
# MAGIC ALTER TABLE temp_schema.temp_stream_table
# MAGIC ALTER COLUMN b DROP NOT NULL;
# COMMAND ----------
run_stream()
### Failure here: StreamingQueryException: Detected schema change ...
# COMMAND ----------
# MAGIC %sql
# MAGIC DROP TABLE temp_schema.temp_stream_table;
# MAGIC DROP SCHEMA temp_schema
# COMMAND ----------
dbutils.fs.rm("dbfs:/tmp/temp_stream_checkpoint", True)
dbutils.fs.rm("dbfs:/tmp/temp_stream_table", True)
Is it a bug?
10-31-2022 06:07 AM
Hi @Kaniz Fatma ,
I cleanup the checkpoint directory in rows 71-12, it should be empty.
Also no new data is inserted into table, the error is on read.
10-31-2022 06:13 AM
Yes, for field "b". nullable=true vs nullable=false
10-31-2022 06:45 AM
DBR 10.4 LTS.
Also tested on 11.3 LTS - same result.
10-31-2022 07:04 AM
@Kaniz Fatma The command used to drop constraint is:
ALTER TABLE temp_schema.temp_stream_table
ALTER COLUMN a DROP NOT NULL;
This means, I want to make the column nullable, not to drop it.
The error appears on readStream step, not on drop constraint.
01-08-2023 09:59 PM
Hi @Anatoly Tikhonov
Hope everything is going great.
Does @Kaniz Fatma response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?
We'd love to hear from you.
Thanks!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group