Re: Databricks JDBC/ODBC write batch size

Alexander1 · ‎04-03-2023

Hi @Suteja Kanuri thanks for the reply. Let me take your options one by one.

Apache Spark JDBC connector: I don't see a bulk/batch write option TO spark only from spark to OTHER databases; if you have a docs pointer, it would be highly appreciated;
spark.databricks.delta.maxFileSize: That is an option after data is transferred, not to increase transfer speed;
OPTIMIZE: see (2)
rewriteBatchedStatements: see (2)

For the time being, we leverage CData Databricks JDBC driver with bulk load option (from SAS) which actually uses a two-step transfer first writing data to cloud storage (ADLS) and then using Databricks COPY INTO for load and Hive registration. It's pretty fast but has a few drawbacks such as full write only (no append/insert/update) and no schema definition (i.e. CData drives type conversion). But it's certainly better than row-wise Databricks JDBC write.

View solution in original post