02-03-2023 01:33 PM
I have spent way too much time to find a solution to the problem of efficiently writing data to Databricks via JDBC/ODBC. I have looked into countless docs, blogs and repos and I cannot find one example where someone is setting some kind of batch/bulk/buffer option to tell the jdbc driver to not write/insert data row-wise into a delta table. Is anyone aware of a solution? Many jdbc drivers to other destinations offer bulk/batch writes, even the Databricks MS SQL connector supports a bulkCopyToSqlDB method and similar for snowflake. Why can I not find anything when writing to Databricks. I looked into many things incl below. Any pointer is much appreciated.
04-03-2023 09:18 AM
Hi @Suteja Kanuri thanks for the reply. Let me take your options one by one.
For the time being, we leverage CData Databricks JDBC driver with bulk load option (from SAS) which actually uses a two-step transfer first writing data to cloud storage (ADLS) and then using Databricks COPY INTO for load and Hive registration. It's pretty fast but has a few drawbacks such as full write only (no append/insert/update) and no schema definition (i.e. CData drives type conversion). But it's certainly better than row-wise Databricks JDBC write.
03-31-2023 08:57 AM
@Alexander Wagner :
Databricks Delta Lake supports bulk writes, which are more efficient than row-wise writes. However, the method for achieving this may vary depending on the specific JDBC driver being used. Here are a few possible solutions:
Keep in mind that the optimal solution may vary depending on your specific use case, so it's important to test and evaluate the performance of each approach.
04-03-2023 09:18 AM
Hi @Suteja Kanuri thanks for the reply. Let me take your options one by one.
For the time being, we leverage CData Databricks JDBC driver with bulk load option (from SAS) which actually uses a two-step transfer first writing data to cloud storage (ADLS) and then using Databricks COPY INTO for load and Hive registration. It's pretty fast but has a few drawbacks such as full write only (no append/insert/update) and no schema definition (i.e. CData drives type conversion). But it's certainly better than row-wise Databricks JDBC write.
04-05-2023 09:59 PM
@Alexander Wagner :
You are correct that the Apache Spark JDBC connector does not have a bulk/batch write option to Spark itself, only from Spark to other databases.
It sounds like you have found a solution using the CData Databricks JDBC driver with bulk load option, which uses a two-step transfer approach. While this approach has some drawbacks, it seems to be working well for your use case. All the best!
04-09-2023 12:02 AM
Hi @Alexander Wagner
Hope everything is going great.
Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you.
Cheers!
04-09-2023 01:44 AM
@Vidula Khanna yes, have done so. thanks.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group