cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Oracle table load from Databricks

Pravin08
New Contributor III

I am trying to load a dataframe from Databricks to target Oracle table using the write method and using JDBC api. I have the right drivers. The job and its corresponding stages are getting completed and the data is getting loaded in Oracle target table but the command cell in the Databricks Notebook keeps running and it eventually times out after sometime. Please let me know if there are any suggestions.

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @Pravin08When dealing with Databricks, JDBC writes, and Oracle, there are a few considerations to improve performance and address the issue youโ€™re facing:

  1. Batch Size:

    • JDBC Batch Size: By default, the JDBC connector writes data row-wise. To improve efficiency, consider specifying a batch size when creating the JDBC connection. This allows multiple rows to be inserted in a single round trip, reducing overhead. You can set the batch size using the batchsize option.
    • Delta Lake: If youโ€™re using Delta Lake, it supports bulk writes. You can specify the batch size using the batchsize option when creating the JDBC connection. Larger batch sizes can improve write performance.
    • Apache Spark JDBC Connector: Consider using the Apache Spark JDBC connector instead of the Databricks-specific one. The Spark connector supports bulk inserts, which can be used with Delta Lake. Ensure you set the batch size appropriately.
  2. File Size Control:

    • Set the spark.databricks.delta.maxFileSize option to control the size of files being written. By increasing this value, you can ensure that Delta Lake writes larger files, which can enhance performance.
  3. Optimize Command:

    • Use the OPTIMIZE command periodically on your Delta tables. This merges small files into larger ones, improving query performance.
  4. Rewrite Batched Statements:

    • If youโ€™re using the Databricks JDBC connector, consider setting the rewriteBatchedStatements option to true. This allows the driver to send multiple statements in a single batch, potentially improving write performance.

 

I hope these suggestions help resolve the issue and optimize your data-loading process! ๐Ÿš€

Pravin08
New Contributor III

Thanks for the response. Can you please elaborate on the Apache Spark JDBC Connector. I am using ojdbc8 driver as per the Databricks documentation. I am not using Delta Lake. I have the data in a dataframe and using write method to insert the data to Oracle.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!