cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Returing and reusing the identity value

dpc
New Contributor III
Hello
 
I have a table that has a column defined as an identity (BIGINT GENERATED ALWAYS AS IDENTITY)
I will be inserting rows into this table in parallel
How can I get the identity and use that within a pipeline
Parallel is relevant as there will be multiple inserts passed to multiple next steps at the same time
e.g. workflow (multiple streams in parallel): Task 1 - insert a row; Task 2 - Insert rows into another table including the identity value from task 1; Task 3 - Insert rows into another table including the identity value from task 1 etc.
 
In SQL Server, I would just insert a row and return @@identity
Then just pass this around using stored procedure(s)
 
Thanks
2 REPLIES 2

szymon_dybczak
Contributor III

Hi @dpc ,

What you're trying to achieve does not make sense in the context of identity columns. Look at below entry from documentation. So, the answer is - if you want to have concurrent transaction, don't use identity columns 🙂

Declaring an identity column on a Delta table disables concurrent transactions. Only use identity columns in use cases where concurrent writes to the target table are not required.

dpc
New Contributor III

Thanks Slash

In this case though, the batch generation is not concurrent, it's sequential but the full batch running can be concurrent (if that makes sense)

So, I could be running 5 batches in parallel (not necessarily starting at the same time) and all 5 generate a different id.

The batches can differ in terms of what they do but the key here is that, where required, they record the batch id that's relevant to their batch - so the id is recorded consistently (any table writes where needed) throughout

There are some suggestions elsewhere that you can generate one, then just read the last batch id

That wouldn't work here

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group