cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

ProtocolChangedException on concurrent blind appends to delta table

MiguelKulisic
New Contributor II

Hello,

I am developing an application that runs multiple processes that write their results to a common delta table as blind appends. According to the docs I've read online: https://docs.databricks.com/delta/concurrency-control.html#protocolchangedexception & https://docs.delta.io/0.4.0/delta-concurrency.html, append only writes should never cause concurrency issues. I am running into the following error:

ProtocolChangedException: The protocol version of the Delta table has been changed by a concurrent update. This happens when multiple writers are writing to an empty directory. Creating the table ahead of time will avoid this conflict. Please try the operation again. Conflicting commit: {"timestamp":1642800186194,"userId":"61587887627726","userName":"USERNAME","operation":"WRITE","operationParameters":{"mode":Append,"partitionBy":["Date","GroupId","Scope"]},"notebook":{"notebookId":"241249750697631"},"clusterId":"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"56","numOutputBytes":"267086","numOutputRows":"61"}} Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.

Some more information for context:

  1. The code that writes the data is:
def saveToDeltaTable(ds: Dataset[Class], dtPath: String) = {
    ds.write.format("delta")
       .partitionBy("Date", "GroupId", "Scope")
       .option("mergeSchema", "true")
       .mode("append")
       .save(dtPath)
}
  1. I'm unable to recreated this very consistently.
  2. Both writes are currently running in the same cluster. They eventually won't.
  3. The partition columns "Date" and "GroupId" have the same values for each write, but the partition column "Scope" will differ between each write.

Given the description of "ProtocolChangedException" given in the docs, it doesn't make much sense to me that this is crashing. My only thought is that it could be due to the mergeSchema flag even though it's currently doing nothing.

Thank you,

Miguel

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?

Can you check if both of you actually write the same schema, or remove the mergeschema?

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

Hi there, @Miguel Kulisic​! It's nice to meet you and thank you for coming to the community for help. We'll give the rest of the community a chance to respond before we come back to this. Thank you in advance for your patience! 🙂

-werners-
Esteemed Contributor III

I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?

Can you check if both of you actually write the same schema, or remove the mergeschema?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.