<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspark? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/59670#M31471</link>
    <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I&amp;nbsp;have currently a delta folder as a table with several columns that are nullable. I want to migrate data to the table and overwrite the content using Pyspark, add several new columns and make them not nullable. I have found a way to make the columns in the pyspark df as non-nullable:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;non_nullable_schema = StructType([
    StructField("column1", StringType(), nullable=False),
    StructField("column2", StringType(), nullable=False),
])

# Apply the new schema to the DataFrame
non_nullable_df = spark.createDataFrame(df.rdd, non_nullable_schema)&lt;/LI-CODE&gt;&lt;P&gt;But it seems like after I write to the existing delta destination folder and load it again, it shows that the columns are nullable again after I print the schema :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;non_nullable_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save("/path/to/delta/files")
df_read=spark.read.format("delta").load("/path/to/delta/files")
df_read.printSchema() &lt;/LI-CODE&gt;&lt;P&gt;Is there any way to change an existing schema in delta to not nullable using pyspark without creating a new delta table?&lt;/P&gt;</description>
    <pubDate>Thu, 08 Feb 2024 10:19:53 GMT</pubDate>
    <dc:creator>Red_blue_green</dc:creator>
    <dc:date>2024-02-08T10:19:53Z</dc:date>
    <item>
      <title>Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspark?</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/59670#M31471</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I&amp;nbsp;have currently a delta folder as a table with several columns that are nullable. I want to migrate data to the table and overwrite the content using Pyspark, add several new columns and make them not nullable. I have found a way to make the columns in the pyspark df as non-nullable:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;non_nullable_schema = StructType([
    StructField("column1", StringType(), nullable=False),
    StructField("column2", StringType(), nullable=False),
])

# Apply the new schema to the DataFrame
non_nullable_df = spark.createDataFrame(df.rdd, non_nullable_schema)&lt;/LI-CODE&gt;&lt;P&gt;But it seems like after I write to the existing delta destination folder and load it again, it shows that the columns are nullable again after I print the schema :&lt;/P&gt;&lt;LI-CODE lang="python"&gt;non_nullable_df.write.format("delta").mode("overwrite").option("overwriteSchema", "true").save("/path/to/delta/files")
df_read=spark.read.format("delta").load("/path/to/delta/files")
df_read.printSchema() &lt;/LI-CODE&gt;&lt;P&gt;Is there any way to change an existing schema in delta to not nullable using pyspark without creating a new delta table?&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 10:19:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/59670#M31471</guid>
      <dc:creator>Red_blue_green</dc:creator>
      <dc:date>2024-02-08T10:19:53Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspar</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/59674#M31473</link>
      <description>&lt;P&gt;You could save the dataframe as a table instead of a delta file and then alter the table to set the columns not nullable:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;table = &amp;lt;your_table_name&amp;gt;
column_name = &amp;lt;column_name&amp;gt;

non_nullable_df.write.saveAsTable(table, mode="overwrite")

spark.sql(f"ALTER TABLE {table} ALTER column {column_name} SET NOT NULL")&lt;/LI-CODE&gt;&lt;P&gt;Make sure that there are no null values in the column which you want to make not nullable. Otherwise you will get an error.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 10:49:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/59674#M31473</guid>
      <dc:creator>Husky</dc:creator>
      <dc:date>2024-02-08T10:49:26Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspar</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/59678#M31474</link>
      <description>&lt;P&gt;Thank you for the suggestion but my current constraint is unfortunately to work with delta files. So saving as a table would not be enough.&lt;/P&gt;</description>
      <pubDate>Thu, 08 Feb 2024 11:00:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/59678#M31474</guid>
      <dc:creator>Red_blue_green</dc:creator>
      <dc:date>2024-02-08T11:00:30Z</dc:date>
    </item>
    <item>
      <title>Re: Databricks: Change the existing schema of columns to non-nullable for a delta table using Pyspar</title>
      <link>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/68485#M33691</link>
      <description>&lt;P&gt;Not sure if you found a solution, you can also try as below. In this case you pass the full path to the delta not the table itself.&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;spark.sql(f"ALTER TABLE delta.`{full_delta_path}` ALTER column {column_name} SET NOT NULL")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 07 May 2024 18:37:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/databricks-change-the-existing-schema-of-columns-to-non-nullable/m-p/68485#M33691</guid>
      <dc:creator>kanjinghat</dc:creator>
      <dc:date>2024-05-07T18:37:13Z</dc:date>
    </item>
  </channel>
</rss>

