Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-02-2025 10:33 AM
Here are some suggestions, not sure if it fits with what you are doing but they are worth mentioning.
The Databricks JDBC driver currently does not support batch updates, which is why your updates appear to process row by row with a batch size of 1.
Here are the details and possible workarounds:
-
Driver Limitation:
- The Databricks JDBC driver enforces a batch size of 1 for updates because it does not currently support batch operations in auto-commit mode.
-
Workarounds:
- Use
COPY INTO: Databricks supports theCOPY INTOcommand, which can handle bulk data ingestion efficiently. This approach sidesteps the limitations of JDBC for batch updates. - Batch Inserts Using Spark SQL: You can implement a workaround by inserting multiple rows in a single SQL statement via Spark SQL's
VALUESclause. For instance, you can construct anINSERT INTOstatement that batches hundreds of rows within a single operation. Note that you may need additional logic to handle splitting large jobs into manageable chunks. - Programmatic Ingestion: If
COPY INTOor Spark SQL is not feasible, consider using Databricks' supported ingestion methods like DataFrames or Delta Lake APIs for optimized data writes.
- Use