05-26-2022 08:06 AM
Hi,
I am dealing with updating master data. I'll do the UPCERT operations on the delta lake table. But after my UPCERT is complete I like to update the master data on the RDBMS table also. Is there any support from Databricks to perform this operation affectively in highly performant way. There are pyspark sql ways as shown below, but I don't see the merge option.
Appreciate any help on this.
Thanks
Krishna
05-30-2022 06:22 AM
Does anyone has any solution for this. Waiting for your valuable inputs
05-31-2022 04:35 AM
That depends on which database and if Databricks + database vendor have an optimized writer AND if merge is supported on the database.
I am not aware of an optimized writer which allows a merge statement.
05-31-2022 04:40 AM
Hi Werners,
Thanks for responding. But Does Databricks API, delta.tables.Deltatable, supports any direct operations with external RDBMS tables?
I was bit surprised this is not possible. Going with traditionally pysql way is not that performant affective.
05-31-2022 04:57 AM
I doubt it because delta lake is a file format. An optimized file format but a file format nonetheless.
For optimized writes to a RDBMS you also need a compute system which opens a connection and runs a driver. That is where the optimization could take place.
This supposed driver could leverage delta lake optimizations.
But I doubt that this is a high priority for Databricks. They are promoting the lakehouse architecture.
It is possible that they are working on optimized drivers to certain databases, but developing optimized drivers for many rdbms systems? Doubt it.
05-31-2022 05:07 AM
Thanks werners, for you valuable input.
My only concern is that, they don't have support from the api level itself. I don't see any option. At least API level if they provide support, the optimizations would be more about relevant driver implementations. At least some of the famous RDBMS they may provide support, such as MySQL, Postgres etc.
05-31-2022 06:22 AM
I get your point and concerns.
If there are plans in that direction, it will have to be a joint effort of Databricks + db vendor.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.