Louis_Frolio
Databricks Employee
Databricks Employee

Here are some suggestions, not sure if it fits with what you are doing but they are worth mentioning.

 

The Databricks JDBC driver currently does not support batch updates, which is why your updates appear to process row by row with a batch size of 1. 
Here are the details and possible workarounds:
  1. Driver Limitation:
    • The Databricks JDBC driver enforces a batch size of 1 for updates because it does not currently support batch operations in auto-commit mode.
  2. Workarounds:
    • Use COPY INTO: Databricks supports the COPY INTO command, which can handle bulk data ingestion efficiently. This approach sidesteps the limitations of JDBC for batch updates.
    • Batch Inserts Using Spark SQL: You can implement a workaround by inserting multiple rows in a single SQL statement via Spark SQL's VALUES clause. For instance, you can construct an INSERT INTO statement that batches hundreds of rows within a single operation. Note that you may need additional logic to handle splitting large jobs into manageable chunks.
    • Programmatic Ingestion: If COPY INTO or Spark SQL is not feasible, consider using Databricks' supported ingestion methods like DataFrames or Delta Lake APIs for optimized data writes.