Databricks Community

Pravin08 · ‎02-19-2024

I am trying to load a dataframe from Databricks to target Oracle table using the write method and using JDBC api. I have the right drivers. The job and its corresponding stages are getting completed and the data is getting loaded in Oracle target table but the command cell in the Databricks Notebook keeps running and it eventually times out after sometime. Please let me know if there are any suggestions.

Kaniz_Fatma · ‎02-19-2024

Hi @Pravin08, When dealing with Databricks, JDBC writes, and Oracle, there are a few considerations to improve performance and address the issue you’re facing:

Batch Size:
- JDBC Batch Size: By default, the JDBC connector writes data row-wise. To improve efficiency, consider specifying a batch size when creating the JDBC connection. This allows multiple rows to be inserted in a single round trip, reducing overhead. You can set the batch size using the batchsize option.
- Delta Lake: If you’re using Delta Lake, it supports bulk writes. You can specify the batch size using the batchsize option when creating the JDBC connection. Larger batch sizes can improve write performance.
- Apache Spark JDBC Connector: Consider using the Apache Spark JDBC connector instead of the Databricks-specific one. The Spark connector supports bulk inserts, which can be used with Delta Lake. Ensure you set the batch size appropriately.
File Size Control:
- Set the spark.databricks.delta.maxFileSize option to control the size of files being written. By increasing this value, you can ensure that Delta Lake writes larger files, which can enhance performance.
Optimize Command:
- Use the OPTIMIZE command periodically on your Delta tables. This merges small files into larger ones, improving query performance.
Rewrite Batched Statements:
- If you’re using the Databricks JDBC connector, consider setting the rewriteBatchedStatements option to true. This allows the driver to send multiple statements in a single batch, potentially improving write performance.

I hope these suggestions help resolve the issue and optimize your data-loading process! 🚀

Pravin08 · ‎02-20-2024

Thanks for the response. Can you please elaborate on the Apache Spark JDBC Connector. I am using ojdbc8 driver as per the Databricks documentation. I am not using Delta Lake. I have the data in a dataframe and using write method to insert the data to Oracle.

Databricks Community

Oracle table load from Databricks

🔔 ALERT: Act Now to Protect Your Community Account; Secure Your Details Before It's Too Late!

Databricks Learning Festival (Virtual): 10 July - 24 July 2024

Data + AI Summit 2024: An Executive Summary for Data Leaders

Big Data Is Back and Is More Important Than AI

Announcing Mosaic AI Agent Framework and Agent Evaluation