Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2023 09:18 AM
Hi @Suteja Kanuri thanks for the reply. Let me take your options one by one.
- Apache Spark JDBC connector: I don't see a bulk/batch write option TO spark only from spark to OTHER databases; if you have a docs pointer, it would be highly appreciated;
- spark.databricks.delta.maxFileSize: That is an option after data is transferred, not to increase transfer speed;
- OPTIMIZE: see (2)
- rewriteBatchedStatements: see (2)
For the time being, we leverage CData Databricks JDBC driver with bulk load option (from SAS) which actually uses a two-step transfer first writing data to cloud storage (ADLS) and then using Databricks COPY INTO for load and Hive registration. It's pretty fast but has a few drawbacks such as full write only (no append/insert/update) and no schema definition (i.e. CData drives type conversion). But it's certainly better than row-wise Databricks JDBC write.