cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Set Not null changes Data type

Box_clown
New Contributor II

Hello,

Just found this issue this week and thought I would ask. An Alter Table alter column set not null is changing a varchar(x) data type to string type. 

I believe this should happen in most environments so I wouldn't need to supply code...

Create a delta table:

Script 01: Create table [schema].[testTbl_01] (tbl_id int not null, test_01 varchar(20) not null) using delta [location]

Run a table description or show table create the table will look like you wanted.

Script 02: alter table [schema].[testTbl_01] add column New_01 varchar(3)

Run a table description or show table create the table will look like you wanted.

Script 03: alter table [schema].[testTbl_01] alter column New_01 set not null

Run a table description or show table create the table and the newly created/added column will now be a string data type. 

Our cluster is currently 14.3 and was updated to that in May 2024. This week is when we noticed this new behavior but not sure 100% when it started. I was able to add a new column last week and this did not happen.

 

3 REPLIES 3

Slash
Contributor

Hi @Box_clown ,

To be precise, Delta Lake format is based on parquet files. For strings, Parquet only has one data type: StringType

So, basically varchar(n) data type under the hood is represented as string with check constraint on the length of the string.

So what I think is happening here is when you perform alter of this column you implicitly removes this check constraint and create new one: null constraint. 

Try to add check constraint on the length of this column to your alter statement.

But agree, it's bit weird. It looks like some kind of bug, or at least this behavior should be documented somewhere

Box_clown
New Contributor II

Hello @Slash thank you for your reply.

The constraint didn't work as it needs to be run in its own alter table statement and even setting that before the 'Set Not Null' still resulted in the datatype changing to string. 

Would you know who I'd contact to see if this is a bug? 

Hi @Box_clown ,

I recreated your example on my cluster and now I think that 'Set Not Null' statement only changes custom metadata in that column. It doesn't remove check constraint on length. Try to insert value with length > 3 and you will get: DELTA_EXCEED_CHAR_VARCHAR_LIMIT
So you can think of it in following way, when you define table with column VARCHAR it will apply custom "label" to STRING datatype with name VARCHAR, and it will create check constraint on that column. 
Now, when you setting not null this label is somehow removed, but check constraint remains intact, so behaviour of a column will be correct.
You can read about how varchar type is represented in below articles. 

VARCHAR and CHAR in Databricks and Delta Lake | by Kyle Hale | Medium

Writing Custom Metadata to Parquet Files and Columns with PyArrow - MungingData

And I think you can create an issue below:
[SPARK-48652] Casting Issue in Spark SQL: String Column Compared to Integer Value Yields Empty Resul...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group