3 weeks ago
HelloI have issue with overwriting schema while using writestream - I do not receive any error - however schema remain unchanged
Below example
df_abc = spark.readstream
.format("cloudFiles")
.option("cloudFiles.format", "parquet")
.option"cloudFiles.schemaLocation", chklocat )
.load(deltatbl)
df_abc = df_abc.withColumn("columna", col("columna").cast("timestamp"))
write = df_abc.writestream
.outputMode("append")
.option("checkpointLocation",chklocat)
.trigger(availableNow=True)
.option("overwriteSchema", "true")
.toTable(dbname + "." + tblname)
3 weeks ago
Hi @lakime, It seems you’re encountering an issue with schema overwriting while using writestream in PySpark.
Let’s troubleshoot this together!
Boolean Value for overwriteSchema:
Schema Migration:
Hopefully, this helps resolve the schema overwriting issue! 🚀
3 weeks ago
Hey Kaniz, not sure do I follow
- overwriteSchema option was set up as you have written
- session configuration is set-up correctly
I have also tried several ways configuration including set up of "mergeSchema", "true" but still doesn't work
3 weeks ago
Hi @lakime, You're encountering schema overwriting issues while using writestream in Databricks.
Let's troubleshoot this together!
Correct Option Placement:
Avoid Writing Data Twice:
Consider mergeSchema Option:
Check for Table ACLs:
3 weeks ago
That did not solve the problem
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.