cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Can I merge delta lake table to RDBMS table directly? Which is the preferred way in Databricks?

Krish-685291
New Contributor III

Hi,

I am dealing with updating master data. I'll do the UPCERT operations on the delta lake table. But after my UPCERT is complete I like to update the master data on the RDBMS table also. Is there any support from Databricks to perform this operation affectively in highly performant way. There are pyspark sql ways as shown below, but I don't see the merge option.

Appreciate any help on this.

Thanks

Krishna

6 REPLIES 6

Krish-685291
New Contributor III

Does anyone has any solution for this. Waiting for your valuable inputs

-werners-
Esteemed Contributor III

That depends on which database and if Databricks + database vendor have an optimized writer AND if merge is supported on the database.

I am not aware of an optimized writer which allows a merge statement.

Krish-685291
New Contributor III

Hi Werners,

Thanks for responding. But Does Databricks API, delta.tables.Deltatable, supports any direct operations with external RDBMS tables?

I was bit surprised this is not possible. Going with traditionally pysql way is not that performant affective.

-werners-
Esteemed Contributor III

I doubt it because delta lake is a file format. An optimized file format but a file format nonetheless.

For optimized writes to a RDBMS you also need a compute system which opens a connection and runs a driver. That is where the optimization could take place.

This supposed driver could leverage delta lake optimizations.

But I doubt that this is a high priority for Databricks. They are promoting the lakehouse architecture.

It is possible that they are working on optimized drivers to certain databases, but developing optimized drivers for many rdbms systems? Doubt it.

Krish-685291
New Contributor III

Thanks werners, for you valuable input.

My only concern is that, they don't have support from the api level itself. I don't see any option. At least API level if they provide support, the optimizations would be more about relevant driver implementations. At least some of the famous RDBMS they may provide support, such as MySQL, Postgres etc.

-werners-
Esteemed Contributor III

I get your point and concerns.

If there are plans in that direction, it will have to be a joint effort of Databricks + db vendor.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.