โ05-26-2022 08:06 AM
Hi,
I am dealing with updating master data. I'll do the UPCERT operations on the delta lake table. But after my UPCERT is complete I like to update the master data on the RDBMS table also. Is there any support from Databricks to perform this operation affectively in highly performant way. There are pyspark sql ways as shown below, but I don't see the merge option.
Appreciate any help on this.
Thanks
Krishna
โ05-30-2022 06:22 AM
Does anyone has any solution for this. Waiting for your valuable inputs
โ05-31-2022 04:35 AM
That depends on which database and if Databricks + database vendor have an optimized writer AND if merge is supported on the database.
I am not aware of an optimized writer which allows a merge statement.
โ05-31-2022 04:40 AM
Hi Werners,
Thanks for responding. But Does Databricks API, delta.tables.Deltatable, supports any direct operations with external RDBMS tables?
I was bit surprised this is not possible. Going with traditionally pysql way is not that performant affective.
โ05-31-2022 04:57 AM
I doubt it because delta lake is a file format. An optimized file format but a file format nonetheless.
For optimized writes to a RDBMS you also need a compute system which opens a connection and runs a driver. That is where the optimization could take place.
This supposed driver could leverage delta lake optimizations.
But I doubt that this is a high priority for Databricks. They are promoting the lakehouse architecture.
It is possible that they are working on optimized drivers to certain databases, but developing optimized drivers for many rdbms systems? Doubt it.
โ05-31-2022 05:07 AM
Thanks werners, for you valuable input.
My only concern is that, they don't have support from the api level itself. I don't see any option. At least API level if they provide support, the optimizations would be more about relevant driver implementations. At least some of the famous RDBMS they may provide support, such as MySQL, Postgres etc.
โ05-31-2022 06:22 AM
I get your point and concerns.
If there are plans in that direction, it will have to be a joint effort of Databricks + db vendor.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group