cancel
Showing results for 
Search instead for 
Did you mean: 

.overwriteschema + writestream

lakime
New Contributor III

HelloI have issue with overwriting schema while using writestream - I do not receive any error - however schema remain unchanged

Below example

df_abc = spark.readstream

   .format("cloudFiles")

   .option("cloudFiles.format", "parquet") 

   .option"cloudFiles.schemaLocation", chklocat )

   .load(deltatbl)

df_abc = df_abc.withColumn("columna", col("columna").cast("timestamp"))

write = df_abc.writestream

   .outputMode("append")

   .option("checkpointLocation",chklocat)

   .trigger(availableNow=True)

   .option("overwriteSchema", "true")

   .toTable(dbname + "." + tblname)

5 REPLIES 5

Kaniz
Community Manager
Community Manager

Hi @lakime, It seems you’re encountering an issue with schema overwriting while using writestream in PySpark. 

 

Let’s troubleshoot this together!

 

Boolean Value for overwriteSchema:

  • The overwriteSchema option expects a string value, not a boolean. You should set it as "True" (with quotes) instead of True.

Schema Migration:

  • If you still face issues, consider enabling schema migration using DataFrameWriter or DataStreamWriter.
  • Set the .option("mergeSchema", "true") to enable schema migration.
  • Additionally, ensure that the session configuration spark.databricks.delta.schema.autoMerge.enabled is set to "true".

Hopefully, this helps resolve the schema overwriting issue! 🚀

lakime
New Contributor III

Hey Kaniz, not sure do I follow

- overwriteSchema option was set up as you have written

- session configuration is set-up correctly

I have also tried several ways configuration including set up of "mergeSchema", "true" but still doesn't work

Kaniz
Community Manager
Community Manager

Hi @lakime, You're encountering schema overwriting issues while using writestream in Databricks. 

 

Let's troubleshoot this together!

 

Correct Option Placement:

  • The overwriteSchema option should be set in the write operation, not the read operation.
  • Make sure to set it when saving the data to the Delta table, not during reading.

Avoid Writing Data Twice:

  • In your example, you're writing the data twice: once as a "normal" directory and then as a managed table.
  • If you want to create an unmanaged table in a custom location, add the path option to the third variant.
  • You can omit the dbfs:/ prefix since it's the default schema.

Consider mergeSchema Option:

  • If your schema changes only add columns or are minor, you can use mergeSchema instead of overwriteSchema.
  • mergeSchema allows you to merge new columns into the existing schema without causing issues.
  • Adjust your code accordingly based on the nature of your schema changes.

Check for Table ACLs:

  • If you have Table ACLs enabled, schema changes might require different permissions.
  • Ensure that your Delta table's necessary permissions (e.g., MODIFY and OWN) are correctly set.

    Remember to apply these adjustments to your code; hopefully, it will resolve the issue. If you encounter any further challenges, feel free to ask! 🚀🔍🔒

lakime
New Contributor III

That did not solve the problem

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.