Alexander1
New Contributor III

Hi @Suteja Kanuri​ thanks for the reply. Let me take your options one by one.

  1. Apache Spark JDBC connector: I don't see a bulk/batch write option TO spark only from spark to OTHER databases; if you have a docs pointer, it would be highly appreciated;
  2. spark.databricks.delta.maxFileSize: That is an option after data is transferred, not to increase transfer speed;
  3. OPTIMIZE: see (2)
  4. rewriteBatchedStatements: see (2)

For the time being, we leverage CData Databricks JDBC driver with bulk load option (from SAS) which actually uses a two-step transfer first writing data to cloud storage (ADLS) and then using Databricks COPY INTO for load and Hive registration. It's pretty fast but has a few drawbacks such as full write only (no append/insert/update) and no schema definition (i.e. CData drives type conversion). But it's certainly better than row-wise Databricks JDBC write.

View solution in original post