cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

ProtocolChangedException on concurrent blind appends to delta table

MiguelKulisic
New Contributor II

Hello,

I am developing an application that runs multiple processes that write their results to a common delta table as blind appends. According to the docs I've read online: https://docs.databricks.com/delta/concurrency-control.html#protocolchangedexception & https://docs.delta.io/0.4.0/delta-concurrency.html, append only writes should never cause concurrency issues. I am running into the following error:

ProtocolChangedException: The protocol version of the Delta table has been changed by a concurrent update. This happens when multiple writers are writing to an empty directory. Creating the table ahead of time will avoid this conflict. Please try the operation again. Conflicting commit: {"timestamp":1642800186194,"userId":"61587887627726","userName":"USERNAME","operation":"WRITE","operationParameters":{"mode":Append,"partitionBy":["Date","GroupId","Scope"]},"notebook":{"notebookId":"241249750697631"},"clusterId":"","isolationLevel":"WriteSerializable","isBlindAppend":true,"operationMetrics":{"numFiles":"56","numOutputBytes":"267086","numOutputRows":"61"}} Refer to https://docs.microsoft.com/azure/databricks/delta/concurrency-control for more details.

Some more information for context:

  1. The code that writes the data is:
def saveToDeltaTable(ds: Dataset[Class], dtPath: String) = {
    ds.write.format("delta")
       .partitionBy("Date", "GroupId", "Scope")
       .option("mergeSchema", "true")
       .mode("append")
       .save(dtPath)
}
  1. I'm unable to recreated this very consistently.
  2. Both writes are currently running in the same cluster. They eventually won't.
  3. The partition columns "Date" and "GroupId" have the same values for each write, but the partition column "Scope" will differ between each write.

Given the description of "ProtocolChangedException" given in the docs, it doesn't make much sense to me that this is crashing. My only thought is that it could be due to the mergeSchema flag even though it's currently doing nothing.

Thank you,

Miguel

1 ACCEPTED SOLUTION

Accepted Solutions

-werners-
Esteemed Contributor III

I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?

Can you check if both of you actually write the same schema, or remove the mergeschema?

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

Hi there, @Miguel Kulisic​! It's nice to meet you and thank you for coming to the community for help. We'll give the rest of the community a chance to respond before we come back to this. Thank you in advance for your patience! 🙂

-werners-
Esteemed Contributor III

I think you are right, the mergeSchema will change the schema of the table, but if you both write to that same table with another schema, which one will it be?

Can you check if both of you actually write the same schema, or remove the mergeschema?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!