topic Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspark? in Data Engineering

Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspark?

Red_blue_green — Thu, 08 Feb 2024 10:19:53 GMT

Hello,

I have currently a delta folder as a table with several columns that are nullable. I want to migrate data to the table and overwrite the content using Pyspark, add several new columns and make them not nullable. I have found a way to make the columns in the pyspark df as non-nullable:

non_nullable_schema = StructType([ StructField("column1", StringType(), nullable=False), StructField("column2", StringType(), nullable=False), ]) # Apply the new schema to the DataFrame non_nullable_df = spark.createDataFrame(df.rdd, non_nullable_schema)

But it seems like after I write to the existing delta destination folder and load it again, it shows that the columns are nullable again after I print the schema :

non_nullable_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save("/path/to/delta/files") df_read=spark.read.format("delta").load("/path/to/delta/files") df_read.printSchema()

Is there any way to change an existing schema in delta to not nullable using pyspark without creating a new delta table?

Re: Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspar

Husky — Thu, 08 Feb 2024 10:49:26 GMT

You could save the dataframe as a table instead of a delta file and then alter the table to set the columns not nullable:

table = <your_table_name> column_name = <column_name> non_nullable_df.write.saveAsTable(table, mode="overwrite") spark.sql(f"ALTER TABLE {table} ALTER column {column_name} SET NOT NULL")

Make sure that there are no null values in the column which you want to make not nullable. Otherwise you will get an error.

Re: Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspar

Red_blue_green — Thu, 08 Feb 2024 11:00:30 GMT

Thank you for the suggestion but my current constraint is unfortunately to work with delta files. So saving as a table would not be enough.

Re: Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspar

kanjinghat — Tue, 07 May 2024 18:37:13 GMT

Not sure if you found a solution, you can also try as below. In this case you pass the full path to the delta not the table itself.

spark.sql(f"ALTER TABLE delta.`{full_delta_path}` ALTER column {column_name} SET NOT NULL")