โ02-03-2023 01:33 PM
I have spent way too much time to find a solution to the problem of efficiently writing data to Databricks via JDBC/ODBC. I have looked into countless docs, blogs and repos and I cannot find one example where someone is setting some kind of batch/bulk/buffer option to tell the jdbc driver to not write/insert data row-wise into a delta table. Is anyone aware of a solution? Many jdbc drivers to other destinations offer bulk/batch writes, even the Databricks MS SQL connector supports a bulkCopyToSqlDB method and similar for snowflake. Why can I not find anything when writing to Databricks. I looked into many things incl below. Any pointer is much appreciated.
โ04-03-2023 09:18 AM
Hi @Suteja Kanuriโ thanks for the reply. Let me take your options one by one.
For the time being, we leverage CData Databricks JDBC driver with bulk load option (from SAS) which actually uses a two-step transfer first writing data to cloud storage (ADLS) and then using Databricks COPY INTO for load and Hive registration. It's pretty fast but has a few drawbacks such as full write only (no append/insert/update) and no schema definition (i.e. CData drives type conversion). But it's certainly better than row-wise Databricks JDBC write.
โ03-31-2023 08:57 AM
@Alexander Wagnerโ :
Databricks Delta Lake supports bulk writes, which are more efficient than row-wise writes. However, the method for achieving this may vary depending on the specific JDBC driver being used. Here are a few possible solutions:
Keep in mind that the optimal solution may vary depending on your specific use case, so it's important to test and evaluate the performance of each approach.
โ04-03-2023 09:18 AM
Hi @Suteja Kanuriโ thanks for the reply. Let me take your options one by one.
For the time being, we leverage CData Databricks JDBC driver with bulk load option (from SAS) which actually uses a two-step transfer first writing data to cloud storage (ADLS) and then using Databricks COPY INTO for load and Hive registration. It's pretty fast but has a few drawbacks such as full write only (no append/insert/update) and no schema definition (i.e. CData drives type conversion). But it's certainly better than row-wise Databricks JDBC write.
โ04-05-2023 09:59 PM
@Alexander Wagnerโ :
You are correct that the Apache Spark JDBC connector does not have a bulk/batch write option to Spark itself, only from Spark to other databases.
It sounds like you have found a solution using the CData Databricks JDBC driver with bulk load option, which uses a two-step transfer approach. While this approach has some drawbacks, it seems to be working well for your use case. All the best!
โ04-09-2023 12:02 AM
Hi @Alexander Wagnerโ
Hope everything is going great.
Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.
Cheers!
โ04-09-2023 01:44 AM
@Vidula Khannaโ yes, have done so. thanks.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group