cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Can I merge delta lake table to RDBMS table directly? Which is the preferred way in Databricks?

Krish-685291
New Contributor III

Hi,

I am dealing with updating master data. I'll do the UPCERT operations on the delta lake table. But after my UPCERT is complete I like to update the master data on the RDBMS table also. Is there any support from Databricks to perform this operation affectively in highly performant way. There are pyspark sql ways as shown below, but I don't see the merge option.

Appreciate any help on this.

Thanks

Krishna

6 REPLIES 6

Krish-685291
New Contributor III

Does anyone has any solution for this. Waiting for your valuable inputs

-werners-
Esteemed Contributor III

That depends on which database and if Databricks + database vendor have an optimized writer AND if merge is supported on the database.

I am not aware of an optimized writer which allows a merge statement.

Krish-685291
New Contributor III

Hi Werners,

Thanks for responding. But Does Databricks API, delta.tables.Deltatable, supports any direct operations with external RDBMS tables?

I was bit surprised this is not possible. Going with traditionally pysql way is not that performant affective.

-werners-
Esteemed Contributor III

I doubt it because delta lake is a file format. An optimized file format but a file format nonetheless.

For optimized writes to a RDBMS you also need a compute system which opens a connection and runs a driver. That is where the optimization could take place.

This supposed driver could leverage delta lake optimizations.

But I doubt that this is a high priority for Databricks. They are promoting the lakehouse architecture.

It is possible that they are working on optimized drivers to certain databases, but developing optimized drivers for many rdbms systems? Doubt it.

Krish-685291
New Contributor III

Thanks werners, for you valuable input.

My only concern is that, they don't have support from the api level itself. I don't see any option. At least API level if they provide support, the optimizations would be more about relevant driver implementations. At least some of the famous RDBMS they may provide support, such as MySQL, Postgres etc.

-werners-
Esteemed Contributor III

I get your point and concerns.

If there are plans in that direction, it will have to be a joint effort of Databricks + db vendor.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group