โ06-23-2022 10:38 AM
A delta lake table is created with identity column and I'm not able to load the data parallelly from four process. i'm getting the metadata exception error.
I don't want to load the data in temp table . Need to load directly and parallelly in to delta table.
โ06-23-2022 11:43 AM
@Gokul Kโ , Identity is stored in table schema (which is the awful solution). That's why concurrent inserts are not supported.
I even record a video about that problem: Delta Identity Column with Databricks 10.4 - crash test - YouTube
โ06-23-2022 10:48 AM
Hi @Gokul Kโ, Thank you for posting your question on the community. Would you mind sharing the error stack here?
โ06-23-2022 10:55 AM
MetadataChangedException: The metadata of the Delta table has been changed by a concurrent update. Please try the operation again
โ06-23-2022 11:11 AM
Hi @Gokul Kโ, This exception occurs when a concurrent transaction updates the metadata of a Delta table. Common causes are ALTER TABLE operations or writes to your Delta table that update the table's schema.
โ06-23-2022 11:21 AM
No alter table operations are carried out. Just loading data from four parallelly running notebooks in to same delta lake table which is having ID as identity column is making the issue.
when loading the data in to temp table and putting in to target table having identity column doesn't make any issues.
But for some reason i need to load the data parallelly in to the table which is having identity column.
โ06-23-2022 11:43 AM
@Gokul Kโ , Identity is stored in table schema (which is the awful solution). That's why concurrent inserts are not supported.
I even record a video about that problem: Delta Identity Column with Databricks 10.4 - crash test - YouTube
โ06-13-2023 12:45 AM
@Hubert Dudekโ @Kaniz Fatmaโ
I am experiencing the same issue. Now that I understand the reason behind it, I would appreciate your assistance in finding a solution for generating a sequence for the table. Multiple concurrent jobs will be performing insertions and updates on the same table. To address the concurrent update issue, I have partitioned the table. However, I am struggling to determine the best approach for generating the Id values. I would greatly appreciate any suggestions you can provide.
โ11-08-2023 09:03 AM
Even in retry method or in try & exception method, there is no guarantee that the load of another parallel process is complete especially for large volume tables. So in such cases even if you try to repeat the write in exception, it would fail. What is best possible solution for this? Is there any other way to generate id column with auto increment method without using GENERATE clause in DDL?
โ06-23-2022 12:01 PM
Thanks @Hubert Dudekโ
โ06-23-2022 12:08 PM
Hi @Gokul Kโ, Thank you for marking the best answer for us. We're happy to help you.
Excited to expand your horizons with us? Click here to Register and begin your journey to success!
Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!