<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: JJDBC Insert Performance and Unsupported Data Types in Administration &amp; Architecture</title>
    <link>https://community.databricks.com/t5/administration-architecture/jjdbc-insert-performance-and-unsupported-data-types/m-p/122283#M3493</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/170163"&gt;@ankit_kothiya1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please find below my findings for your query&lt;/P&gt;
&lt;H2 id="1-slow-insert-performance-via-databricks-jdbc-driv" class="mb-2 mt-6 text-base font-[500] first:mt-0 md:text-lg dark:font-[475] [hr+&amp;amp;]:mt-4"&gt;1. Slow INSERT Performance via Databricks JDBC Driver&lt;/H2&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Observation:&lt;/STRONG&gt;&lt;BR /&gt;Each INSERT query takes about 1 second via the Databricks JDBC driver, which is unusually slow for high-throughput use cases.&lt;/P&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Possible Reasons:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Row-by-Row Execution:&lt;/STRONG&gt;&lt;BR /&gt;Recent versions of Databricks Runtime (14.x and above) have changed how JDBC insert operations are handled. Instead of batching inserts, each row is inserted individually, resulting in significant overhead and slow performance&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;This is a regression from earlier runtimes (such as 13.1), which supported bulk/batch inserts.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Batch Size Ignored:&lt;/STRONG&gt;&lt;BR /&gt;Even if you set the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;batchsize&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;parameter or use JDBC batch APIs, these settings currently have no effect—each insert is still executed as a separate statement&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Network and Overhead:&lt;/STRONG&gt;&lt;BR /&gt;Each individual insert incurs round-trip network latency and server-side processing overhead, which adds up quickly when inserting many rows&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Driver Limitations:&lt;/STRONG&gt;&lt;BR /&gt;The Databricks JDBC driver is not currently optimized for high-throughput, row-by-row inserts. There is an open feature request to improve this&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Recommendations &amp;amp; Optimizations:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Bulk Insert Alternatives:&lt;/STRONG&gt;&lt;BR /&gt;If possible, avoid row-by-row inserts via JDBC. Instead, consider:&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;Writing data to a CSV or Parquet file and using Databricks' bulk load mechanisms (e.g., COPY INTO, or Spark DataFrame writes).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;Using Databricks' native APIs (like Spark DataFrame&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.write&lt;/CODE&gt;) for large data loads.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Connection Pooling:&lt;/STRONG&gt;&lt;BR /&gt;Use a connection pool (e.g., HikariCP, Apache DBCP) to reduce connection overhead&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Caching:&lt;/STRONG&gt;&lt;BR /&gt;If your use case allows, enable smart caching in your JDBC driver to reduce repeated data transfer.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Monitor for Updates:&lt;/STRONG&gt;&lt;BR /&gt;Watch for future updates from Databricks regarding support for true JDBC batch/bulk inserts.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2 id="2-unsupported-data-types-in-databricks-jdbc-driver" class="mb-2 mt-6 text-base font-[500] first:mt-0 md:text-lg dark:font-[475] [hr+&amp;amp;]:mt-4"&gt;2. Unsupported Data Types in Databricks JDBC Driver&lt;/H2&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Error:&lt;/STRONG&gt;&lt;BR /&gt;When inserting Array, Map, and Binary types using parameterized queries, you receive:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-textMainDark selection:!text-superDark selection:bg-superDuper/10 bg-offset dark:bg-offsetDark my-md relative flex flex-col rounded font-mono text-sm font-thin"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl sticky top-0 flex h-0 items-start justify-end"&gt;
&lt;DIV class="flex items-center min-w-0 font-medium gap-1.5 justify-center"&gt;
&lt;DIV class="flex shrink-0 items-center justify-center size-4"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-text-200 bg-background-300 py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;text&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="pr-lg"&gt;&lt;SPAN&gt;&lt;CODE&gt;java.sql.SQLException: [Databricks][JDBCDriver](500352) Error getting the parameter data type: HIVE_PARAMETER_QUERY_DATA_TYPE_ERR_NON_SUPPORT_DATA_TYPE
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Unsupported Data Types:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="my-0"&gt;The Databricks JDBC driver does&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;not&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;support the following data types for parameterized queries:&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;ARRAY&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;MAP&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;BINARY&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;TEXT&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;NVARCHAR&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;Other complex or nested types&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Thu, 19 Jun 2025 18:02:22 GMT</pubDate>
    <dc:creator>Saritha_S</dc:creator>
    <dc:date>2025-06-19T18:02:22Z</dc:date>
    <item>
      <title>JJDBC Insert Performance and Unsupported Data Types</title>
      <link>https://community.databricks.com/t5/administration-architecture/jjdbc-insert-performance-and-unsupported-data-types/m-p/121966#M3479</link>
      <description>&lt;P&gt;We are reaching out regarding two observations with the Databricks JDBC driver:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;We’ve noticed that each INSERT query is taking approximately 1 second to execute via the JDBC driver (please refer to the attached screenshot). This seems unusually slow for our use case. Could you help us understand the possible reasons for this performance issue? Additionally, please let us know if there are any recommended configuration changes or optimizations we might be missing.&lt;BR /&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="ankit_kothiya1_0-1750157093748.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/17565iF3F3DEE1EEE4B386/image-size/medium?v=v2&amp;amp;px=400" role="button" title="ankit_kothiya1_0-1750157093748.png" alt="ankit_kothiya1_0-1750157093748.png" /&gt;&lt;/span&gt;&lt;/LI&gt;&lt;LI&gt;Could you provide a list of data types that are currently not supported by the Databricks JDBC driver?&lt;BR /&gt;For Array, Map and Binary data type insertion using parameterized query, we are getting below error.&lt;BR /&gt;java.sql.SQLException: [Databricks][JDBCDriver](500352) Error getting the parameter data type: HIVE_PARAMETER_QUERY_DATA_TYPE_ERR_NON_SUPPORT_DATA_TYPE&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;We appreciate your support and look forward to your response.&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jun 2025 10:45:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/jjdbc-insert-performance-and-unsupported-data-types/m-p/121966#M3479</guid>
      <dc:creator>ankit_kothiya1</dc:creator>
      <dc:date>2025-06-17T10:45:21Z</dc:date>
    </item>
    <item>
      <title>Re: JJDBC Insert Performance and Unsupported Data Types</title>
      <link>https://community.databricks.com/t5/administration-architecture/jjdbc-insert-performance-and-unsupported-data-types/m-p/122283#M3493</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/170163"&gt;@ankit_kothiya1&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Please find below my findings for your query&lt;/P&gt;
&lt;H2 id="1-slow-insert-performance-via-databricks-jdbc-driv" class="mb-2 mt-6 text-base font-[500] first:mt-0 md:text-lg dark:font-[475] [hr+&amp;amp;]:mt-4"&gt;1. Slow INSERT Performance via Databricks JDBC Driver&lt;/H2&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Observation:&lt;/STRONG&gt;&lt;BR /&gt;Each INSERT query takes about 1 second via the Databricks JDBC driver, which is unusually slow for high-throughput use cases.&lt;/P&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Possible Reasons:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Row-by-Row Execution:&lt;/STRONG&gt;&lt;BR /&gt;Recent versions of Databricks Runtime (14.x and above) have changed how JDBC insert operations are handled. Instead of batching inserts, each row is inserted individually, resulting in significant overhead and slow performance&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;This is a regression from earlier runtimes (such as 13.1), which supported bulk/batch inserts.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Batch Size Ignored:&lt;/STRONG&gt;&lt;BR /&gt;Even if you set the&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;batchsize&lt;/CODE&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;parameter or use JDBC batch APIs, these settings currently have no effect—each insert is still executed as a separate statement&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Network and Overhead:&lt;/STRONG&gt;&lt;BR /&gt;Each individual insert incurs round-trip network latency and server-side processing overhead, which adds up quickly when inserting many rows&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Driver Limitations:&lt;/STRONG&gt;&lt;BR /&gt;The Databricks JDBC driver is not currently optimized for high-throughput, row-by-row inserts. There is an open feature request to improve this&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Recommendations &amp;amp; Optimizations:&lt;/STRONG&gt;&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Bulk Insert Alternatives:&lt;/STRONG&gt;&lt;BR /&gt;If possible, avoid row-by-row inserts via JDBC. Instead, consider:&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;Writing data to a CSV or Parquet file and using Databricks' bulk load mechanisms (e.g., COPY INTO, or Spark DataFrame writes).&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;Using Databricks' native APIs (like Spark DataFrame&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;CODE&gt;.write&lt;/CODE&gt;) for large data loads.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Connection Pooling:&lt;/STRONG&gt;&lt;BR /&gt;Use a connection pool (e.g., HikariCP, Apache DBCP) to reduce connection overhead&lt;SPAN class="whitespace-nowrap"&gt;.&lt;/SPAN&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Caching:&lt;/STRONG&gt;&lt;BR /&gt;If your use case allows, enable smart caching in your JDBC driver to reduce repeated data transfer.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Monitor for Updates:&lt;/STRONG&gt;&lt;BR /&gt;Watch for future updates from Databricks regarding support for true JDBC batch/bulk inserts.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;
&lt;HR /&gt;
&lt;H2 id="2-unsupported-data-types-in-databricks-jdbc-driver" class="mb-2 mt-6 text-base font-[500] first:mt-0 md:text-lg dark:font-[475] [hr+&amp;amp;]:mt-4"&gt;2. Unsupported Data Types in Databricks JDBC Driver&lt;/H2&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Error:&lt;/STRONG&gt;&lt;BR /&gt;When inserting Array, Map, and Binary types using parameterized queries, you receive:&lt;/P&gt;
&lt;DIV class="w-full md:max-w-[90vw]"&gt;
&lt;DIV class="codeWrapper text-textMainDark selection:!text-superDark selection:bg-superDuper/10 bg-offset dark:bg-offsetDark my-md relative flex flex-col rounded font-mono text-sm font-thin"&gt;
&lt;DIV class="translate-y-xs -translate-x-xs bottom-xl mb-xl sticky top-0 flex h-0 items-start justify-end"&gt;
&lt;DIV class="flex items-center min-w-0 font-medium gap-1.5 justify-center"&gt;
&lt;DIV class="flex shrink-0 items-center justify-center size-4"&gt;&amp;nbsp;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="-mt-xl"&gt;
&lt;DIV&gt;
&lt;DIV class="text-text-200 bg-background-300 py-xs px-sm inline-block rounded-br rounded-tl-[3px] font-thin" data-testid="code-language-indicator"&gt;text&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;DIV class="pr-lg"&gt;&lt;SPAN&gt;&lt;CODE&gt;java.sql.SQLException: [Databricks][JDBCDriver](500352) Error getting the parameter data type: HIVE_PARAMETER_QUERY_DATA_TYPE_ERR_NON_SUPPORT_DATA_TYPE
&lt;/CODE&gt;&lt;/SPAN&gt;&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;/DIV&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;Unsupported Data Types:&lt;/STRONG&gt;&lt;/P&gt;
&lt;P class="my-0"&gt;The Databricks JDBC driver does&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;not&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;support the following data types for parameterized queries:&lt;/P&gt;
&lt;UL class="marker:text-textOff list-disc"&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;ARRAY&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;MAP&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;BINARY&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;TEXT&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;&lt;STRONG&gt;NVARCHAR&lt;/STRONG&gt;&lt;/P&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;P class="my-0"&gt;Other complex or nested types&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Thu, 19 Jun 2025 18:02:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/jjdbc-insert-performance-and-unsupported-data-types/m-p/122283#M3493</guid>
      <dc:creator>Saritha_S</dc:creator>
      <dc:date>2025-06-19T18:02:22Z</dc:date>
    </item>
    <item>
      <title>Re: JJDBC Insert Performance and Unsupported Data Types</title>
      <link>https://community.databricks.com/t5/administration-architecture/jjdbc-insert-performance-and-unsupported-data-types/m-p/122316#M3500</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/632"&gt;@Saritha_S&lt;/a&gt;&amp;nbsp; Thanks for the response. Let us check on above suggestions to improve insert query performance.&lt;/P&gt;</description>
      <pubDate>Fri, 20 Jun 2025 07:01:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/administration-architecture/jjdbc-insert-performance-and-unsupported-data-types/m-p/122316#M3500</guid>
      <dc:creator>ankit_kothiya1</dc:creator>
      <dc:date>2025-06-20T07:01:27Z</dc:date>
    </item>
  </channel>
</rss>

