- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-21-2022 01:52 PM
Hello,
I am developing an application that runs multiple processes that write their results to a common delta table as blind appends. According to the docs I've read online: https://docs.databricks.com/delta/concurrency-control.html#protocolchangedexception & https://docs.delta.io/0.4.0/delta-concurrency.html, append only writes should never cause concurrency issues. I am running into the following error:
ProtocolChangedException: The protocol version of the Delta table has been changed by a concurrent update. This happens when multiple writers are writing to an empty directory. Creating the table ahead of time will avoid this conflict. Please try the operation again. Conflicting commit: {"timestamp":1642800186194,"userId":"61587887627726","userName":"USERNAME","operation":"WRITE","operationParameters":{"mode":Append,"partitionBy":["Date","GroupId","Scope"]},"notebook":{"notebookId":"241249750697631"},"clusterId":"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"56","numOutputBytes":"267086","numOutputRows":"61"}} Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.
Some more information for context:
- The code that writes the data is:
def saveToDeltaTable(ds: Dataset[Class], dtPath: String) = {
ds.write.format("delta")
.partitionBy("Date", "GroupId", "Scope")
.option("mergeSchema", "true")
.mode("append")
.save(dtPath)
}
- I'm unable to recreated this very consistently.
- Both writes are currently running in the same cluster. They eventually won't.
- The partition columns "Date" and "GroupId" have the same values for each write, but the partition column "Scope" will differ between each write.
Given the description of "ProtocolChangedException" given in the docs, it doesn't make much sense to me that this is crashing. My only thought is that it could be due to the mergeSchema flag even though it's currently doing nothing.
Thank you,
Miguel
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2022 06:36 AM
I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?
Can you check if both of you actually write the same schema, or remove the mergeschema?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-24-2022 08:04 AM
Hi there, @Miguel Kulisic! It's nice to meet you and thank you for coming to the community for help. We'll give the rest of the community a chance to respond before we come back to this. Thank you in advance for your patience! 🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-25-2022 06:36 AM
I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?
Can you check if both of you actually write the same schema, or remove the mergeschema?

