JJDBC Insert Performance and Unsupported Data Types

ankit_kothiya1 — Tue, 17 Jun 2025 10:45:21 GMT

We are reaching out regarding two observations with the Databricks JDBC driver:

We’ve noticed that each INSERT query is taking approximately 1 second to execute via the JDBC driver (please refer to the attached screenshot). This seems unusually slow for our use case. Could you help us understand the possible reasons for this performance issue? Additionally, please let us know if there are any recommended configuration changes or optimizations we might be missing.
Could you provide a list of data types that are currently not supported by the Databricks JDBC driver?
For Array, Map and Binary data type insertion using parameterized query, we are getting below error.
java.sql.SQLException: [Databricks][JDBCDriver](500352) Error getting the parameter data type: HIVE_PARAMETER_QUERY_DATA_TYPE_ERR_NON_SUPPORT_DATA_TYPE

We appreciate your support and look forward to your response.

Re: JJDBC Insert Performance and Unsupported Data Types

Saritha_S — Thu, 19 Jun 2025 18:02:22 GMT

Hi @ankit_kothiya1

Please find below my findings for your query

1. Slow INSERT Performance via Databricks JDBC Driver

Observation:
Each INSERT query takes about 1 second via the Databricks JDBC driver, which is unusually slow for high-throughput use cases.

Possible Reasons:

Row-by-Row Execution:
Recent versions of Databricks Runtime (14.x and above) have changed how JDBC insert operations are handled. Instead of batching inserts, each row is inserted individually, resulting in significant overhead and slow performance. This is a regression from earlier runtimes (such as 13.1), which supported bulk/batch inserts.
Batch Size Ignored:
Even if you set the batchsize parameter or use JDBC batch APIs, these settings currently have no effect—each insert is still executed as a separate statement.
Network and Overhead:
Each individual insert incurs round-trip network latency and server-side processing overhead, which adds up quickly when inserting many rows.
Driver Limitations:
The Databricks JDBC driver is not currently optimized for high-throughput, row-by-row inserts. There is an open feature request to improve this.

Recommendations & Optimizations:

Bulk Insert Alternatives:
If possible, avoid row-by-row inserts via JDBC. Instead, consider:
- Writing data to a CSV or Parquet file and using Databricks' bulk load mechanisms (e.g., COPY INTO, or Spark DataFrame writes).
- Using Databricks' native APIs (like Spark DataFrame .write) for large data loads.
Connection Pooling:
Use a connection pool (e.g., HikariCP, Apache DBCP) to reduce connection overhead.
Caching:
If your use case allows, enable smart caching in your JDBC driver to reduce repeated data transfer.
Monitor for Updates:
Watch for future updates from Databricks regarding support for true JDBC batch/bulk inserts.

2. Unsupported Data Types in Databricks JDBC Driver

Error:
When inserting Array, Map, and Binary types using parameterized queries, you receive:

text

java.sql.SQLException: [Databricks][JDBCDriver](500352) Error getting the parameter data type: HIVE_PARAMETER_QUERY_DATA_TYPE_ERR_NON_SUPPORT_DATA_TYPE

Unsupported Data Types:

The Databricks JDBC driver does not support the following data types for parameterized queries:

ARRAY
MAP
BINARY
TEXT
NVARCHAR
Other complex or nested types

Re: JJDBC Insert Performance and Unsupported Data Types

ankit_kothiya1 — Fri, 20 Jun 2025 07:01:27 GMT

@Saritha_S Thanks for the response. Let us check on above suggestions to improve insert query performance.

topic Re: JJDBC Insert Performance and Unsupported Data Types in Administration & Architecture

JJDBC Insert Performance and Unsupported Data Types

Re: JJDBC Insert Performance and Unsupported Data Types

1. Slow INSERT Performance via Databricks JDBC Driver

2. Unsupported Data Types in Databricks JDBC Driver

Re: JJDBC Insert Performance and Unsupported Data Types