Hello,
I have currently a delta folder as a table with several columns that are nullable. I want to migrate data to the table and overwrite the content using Pyspark, add several new columns and make them not nullable. I have found a way to make the columns in the pyspark df as non-nullable:
non_nullable_schema = StructType([
StructField("column1", StringType(), nullable=False),
StructField("column2", StringType(), nullable=False),
])
# Apply the new schema to the DataFrame
non_nullable_df = spark.createDataFrame(df.rdd, non_nullable_schema)
But it seems like after I write to the existing delta destination folder and load it again, it shows that the columns are nullable again after I print the schema :
non_nullable_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save("/path/to/delta/files")
df_read=spark.read.format("delta").load("/path/to/delta/files")
df_read.printSchema()
Is there any way to change an existing schema in delta to not nullable using pyspark without creating a new delta table?