cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practices to load single delta table in parallel from multiple processes.

Anonymous47
New Contributor II

Hi all,

A delta lake table is created with identity column, and it is not possible to load the data parallelly to this table from multiple process as it leads to MetadataChangedException.

Based on another post from community, we can have try to repeat the write in exception or retry attempts. But for large volume tables taking long to finish, it might still fail even in retry?

Want to understand, 1) What are the best practices that we can implement for this use case or for any parallel writes? When to use row level concurrency? 2) Is there any way to generate sequential number without using identity column as UUID or monotonically_increasing_id() will not provide sequential series. 3) Is there any enhancement in pipeline to introduce sequence equivalent on oracle? Current sequence function in databricks is different.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group