Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspark?

I have currently a delta folder as a table with several columns that are nullable. I want to migrate data to the table and overwrite the content using Pyspark, add several new columns and make them not nullable. I have found a way to make the columns in the pyspark df as non-nullable:

non_nullable_schema = StructType([
    StructField("column1", StringType(), nullable=False),
    StructField("column2", StringType(), nullable=False),

# Apply the new schema to the DataFrame
non_nullable_df = spark.createDataFrame(df.rdd, non_nullable_schema)

But it seems like after I write to the existing delta destination folder and load it again, it shows that the columns are nullable again after I print the schema :

non_nullable_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save("/path/to/delta/files")"delta").load("/path/to/delta/files")

Is there any way to change an existing schema in delta to not nullable using pyspark without creating a new delta table?


You could save the dataframe as a table instead of a delta file and then alter the table to set the columns not nullable:

table = <your_table_name>
column_name = <column_name>

non_nullable_df.write.saveAsTable(table, mode="overwrite")

spark.sql(f"ALTER TABLE {table} ALTER column {column_name} SET NOT NULL")

Make sure that there are no null values in the column which you want to make not nullable. Otherwise you will get an error.

Thank you for the suggestion but my current constraint is unfortunately to work with delta files. So saving as a table would not be enough.

