Hi @ankit_kothiya1
Please find below my findings for your query
1. Slow INSERT Performance via Databricks JDBC Driver
Observation:
Each INSERT query takes about 1 second via the Databricks JDBC driver, which is unusually slow for high-throughput use cases.
Possible Reasons:
-
Row-by-Row Execution:
Recent versions of Databricks Runtime (14.x and above) have changed how JDBC insert operations are handled. Instead of batching inserts, each row is inserted individually, resulting in significant overhead and slow performance. This is a regression from earlier runtimes (such as 13.1), which supported bulk/batch inserts.
-
Batch Size Ignored:
Even if you set the batchsize
parameter or use JDBC batch APIs, these settings currently have no effect—each insert is still executed as a separate statement.
-
Network and Overhead:
Each individual insert incurs round-trip network latency and server-side processing overhead, which adds up quickly when inserting many rows.
-
Driver Limitations:
The Databricks JDBC driver is not currently optimized for high-throughput, row-by-row inserts. There is an open feature request to improve this.
Recommendations & Optimizations:
-
Bulk Insert Alternatives:
If possible, avoid row-by-row inserts via JDBC. Instead, consider:
-
Writing data to a CSV or Parquet file and using Databricks' bulk load mechanisms (e.g., COPY INTO, or Spark DataFrame writes).
-
Using Databricks' native APIs (like Spark DataFrame .write
) for large data loads.
-
Connection Pooling:
Use a connection pool (e.g., HikariCP, Apache DBCP) to reduce connection overhead.
-
Caching:
If your use case allows, enable smart caching in your JDBC driver to reduce repeated data transfer.
-
Monitor for Updates:
Watch for future updates from Databricks regarding support for true JDBC batch/bulk inserts.
2. Unsupported Data Types in Databricks JDBC Driver
Error:
When inserting Array, Map, and Binary types using parameterized queries, you receive:
java.sql.SQLException: [Databricks][JDBCDriver](500352) Error getting the parameter data type: HIVE_PARAMETER_QUERY_DATA_TYPE_ERR_NON_SUPPORT_DATA_TYPE
Unsupported Data Types:
The Databricks JDBC driver does not support the following data types for parameterized queries:
-
ARRAY
-
MAP
-
BINARY
-
TEXT
-
NVARCHAR
-
Other complex or nested types