cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

"Detected schema change" error while reading from delta table in streaming after applying "ALTER COLUMN DROP NOT NULL" to more than one columns.

Anatoly
New Contributor III

Hi!

I have a delta table and a process that reading a stream from this table.

I need to drop the NOT NULL constraint from some of the columns of this table.

The first drop command does not affect the reading stream.

But the second command results in error:

StreamingQueryException: Detected schema change:
old schema: root
-- a: string (nullable = true)
-- b: string (nullable = true)
-- c: timestamp (nullable = false)
 
 
new schema: root
-- a: string (nullable = true)
-- b: string (nullable = false)
-- c: timestamp (nullable = false)

Full python notebook:

# Databricks notebook source
# MAGIC %sql
# MAGIC CREATE SCHEMA temp_schema
 
# COMMAND ----------
 
# MAGIC %sql
# MAGIC CREATE TABLE temp_schema.temp_stream_table (
# MAGIC   a STRING NOT NULL,
# MAGIC   b STRING NOT NULL,
# MAGIC   c TIMESTAMP NOT NULL
# MAGIC   )
# MAGIC LOCATION "dbfs:/tmp/temp_stream_table"
 
# COMMAND ----------
 
# MAGIC %sql
# MAGIC INSERT INTO temp_schema.temp_stream_table
# MAGIC VALUES ("a", "b", CURRENT_DATE), ("c", "d", CURRENT_DATE)
 
# COMMAND ----------
 
def run_stream():
    return (
      spark
     .readStream
     .option("mergeSchema", "true")
     .table("temp_schema.temp_stream_table")
     .writeStream
     .trigger(availableNow=True)
     .option("checkpointLocation", "/dbfs/tmp/temp_stream_checkpoint")
     .foreachBatch(lambda df, _id: None)
     .start()
     .awaitTermination()
    )
 
# COMMAND ----------
 
run_stream()
 
# COMMAND ----------
 
# MAGIC %sql
# MAGIC ALTER TABLE temp_schema.temp_stream_table
# MAGIC ALTER COLUMN a DROP NOT NULL;
 
# COMMAND ----------
 
run_stream()
# Success!
 
# COMMAND ----------
 
# MAGIC %sql
# MAGIC ALTER TABLE temp_schema.temp_stream_table
# MAGIC ALTER COLUMN b DROP NOT NULL;
 
# COMMAND ----------
 
run_stream()
### Failure here: StreamingQueryException: Detected schema change ...
 
# COMMAND ----------
 
# MAGIC %sql
# MAGIC DROP TABLE temp_schema.temp_stream_table;
# MAGIC DROP SCHEMA temp_schema
 
# COMMAND ----------
 
dbutils.fs.rm("dbfs:/tmp/temp_stream_checkpoint", True)
dbutils.fs.rm("dbfs:/tmp/temp_stream_table", True)

Is it a bug?

5 REPLIES 5

Anatoly
New Contributor III

Hi @Kaniz Fatma​ ,

I cleanup the checkpoint directory in rows 71-12, it should be empty.

Also no new data is inserted into table, the error is on read.

Anatoly
New Contributor III

Yes, for field "b". nullable=true vs nullable=false

Anatoly
New Contributor III

DBR 10.4 LTS.

Also tested on 11.3 LTS - same result.

Anatoly
New Contributor III

@Kaniz Fatma​  The command used to drop constraint is:

ALTER TABLE temp_schema.temp_stream_table
ALTER COLUMN a DROP NOT NULL;

This means, I want to make the column nullable, not to drop it.

The error appears on readStream step, not on drop constraint.

Anonymous
Not applicable

Hi @Anatoly Tikhonov​ 

Hope everything is going great.

Does @Kaniz Fatma​  response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly?

We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group