Hi @Pravin08, When dealing with Databricks, JDBC writes, and Oracle, there are a few considerations to improve performance and address the issue youโre facing:
-
Batch Size:
- JDBC Batch Size: By default, the JDBC connector writes data row-wise. To improve efficiency, consider specifying a batch size when creating the JDBC connection. This allows multiple rows to be inserted in a single round trip, reducing overhead. You can set the batch size using the
batchsize
option.
- Delta Lake: If youโre using Delta Lake, it supports bulk writes. You can specify the batch size using the
batchsize
option when creating the JDBC connection. Larger batch sizes can improve write performance.
- Apache Spark JDBC Connector: Consider using the Apache Spark JDBC connector instead of the Databricks-specific one. The Spark connector supports bulk inserts, which can be used with Delta Lake. Ensure you set the batch size appropriately.
-
File Size Control:
- Set the
spark.databricks.delta.maxFileSize
option to control the size of files being written. By increasing this value, you can ensure that Delta Lake writes larger files, which can enhance performance.
-
Optimize Command:
- Use the
OPTIMIZE
command periodically on your Delta tables. This merges small files into larger ones, improving query performance.
-
Rewrite Batched Statements:
- If youโre using the Databricks JDBC connector, consider setting the
rewriteBatchedStatements
option to true
. This allows the driver to send multiple statements in a single batch, potentially improving write performance.
I hope these suggestions help resolve the issue and optimize your data-loading process! ๐