<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Unicode converter buffer overflow error. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unicode-converter-buffer-overflow-error/m-p/148243#M52852</link>
    <description>&lt;P&gt;We are currently using Informatica Powercenter and pulling down data from Databricks PVC using an ODBC connection and its been working great.&amp;nbsp; Our company is moving to Databricks SaaS and I am trying to get Informatica Powercenter to connect to SaaS and pull down data using ODBC as well.&amp;nbsp; The problem is with SaaS we keep getting a unicode converter buffer overflow error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have seen where we can add an entry in our ODBC file (&lt;SPAN&gt;DriverUnicodeType=1) but it did not resolve the error.&amp;nbsp; Our Informatica server is a Linux box with RHEL 7.&amp;nbsp; We tried using the Databricks ODBC driver version 2.9.2 but get this error when trying to install:&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;version `GLIBC_2.27' not found (required by /opt/simba/spark/lib/64/libsparkodbc_sb64.so)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;All I have found on this error is that the Linux OS needs to be upgraded.&amp;nbsp; Our company is currently using Databricks PVC but everyone is being migrated to SaaS by March.&amp;nbsp; They are not willing to upgrade the Informatica Linux OS because it will be sunset once our group moves to SaaS but the issue is that we are not doing that until June/July so there is a gap of a few months where w still need Informatica to work and pull from SaaS.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Are there any other options we can try to get our Informatica ODBC connection to work?&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 12 Feb 2026 23:46:37 GMT</pubDate>
    <dc:creator>kmcas10</dc:creator>
    <dc:date>2026-02-12T23:46:37Z</dc:date>
    <item>
      <title>Unicode converter buffer overflow error.</title>
      <link>https://community.databricks.com/t5/data-engineering/unicode-converter-buffer-overflow-error/m-p/148243#M52852</link>
      <description>&lt;P&gt;We are currently using Informatica Powercenter and pulling down data from Databricks PVC using an ODBC connection and its been working great.&amp;nbsp; Our company is moving to Databricks SaaS and I am trying to get Informatica Powercenter to connect to SaaS and pull down data using ODBC as well.&amp;nbsp; The problem is with SaaS we keep getting a unicode converter buffer overflow error.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have seen where we can add an entry in our ODBC file (&lt;SPAN&gt;DriverUnicodeType=1) but it did not resolve the error.&amp;nbsp; Our Informatica server is a Linux box with RHEL 7.&amp;nbsp; We tried using the Databricks ODBC driver version 2.9.2 but get this error when trying to install:&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&amp;nbsp;version `GLIBC_2.27' not found (required by /opt/simba/spark/lib/64/libsparkodbc_sb64.so)&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;All I have found on this error is that the Linux OS needs to be upgraded.&amp;nbsp; Our company is currently using Databricks PVC but everyone is being migrated to SaaS by March.&amp;nbsp; They are not willing to upgrade the Informatica Linux OS because it will be sunset once our group moves to SaaS but the issue is that we are not doing that until June/July so there is a gap of a few months where w still need Informatica to work and pull from SaaS.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Are there any other options we can try to get our Informatica ODBC connection to work?&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 12 Feb 2026 23:46:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unicode-converter-buffer-overflow-error/m-p/148243#M52852</guid>
      <dc:creator>kmcas10</dc:creator>
      <dc:date>2026-02-12T23:46:37Z</dc:date>
    </item>
    <item>
      <title>Re: Unicode converter buffer overflow error.</title>
      <link>https://community.databricks.com/t5/data-engineering/unicode-converter-buffer-overflow-error/m-p/150127#M53257</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/215953"&gt;@kmcas10&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;Let me share some guidance on this. this is a scenario that comes up when transitioning from Databricks PVC (private cloud) to SaaS with legacy ODBC tooling, and there are several things you can try to bridge the gap until your full migration.&lt;/P&gt;
&lt;P&gt;Let me break down the two issues you are dealing with and the options for each.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;ISSUE 1: THE UNICODE CONVERTER BUFFER OVERFLOW ERROR&lt;/P&gt;
&lt;P&gt;This error typically occurs when there is an encoding mismatch between how the Databricks ODBC driver returns string data and how Informatica PowerCenter expects to receive it. The Simba Spark ODBC driver on Linux defaults to UTF-32 encoding, while Informatica PowerCenter often expects UTF-16. When the 4-byte-per-character UTF-32 data is received into a buffer sized for 2-byte-per-character UTF-16, you get the buffer overflow.&lt;/P&gt;
&lt;P&gt;You mentioned trying DriverUnicodeType=1 in the ODBC configuration, which is the right direction. Here are the full set of options to try:&lt;/P&gt;
&lt;P&gt;1. Confirm DriverUnicodeType is set correctly in the right location. This setting should go in the DSN section of your odbc.ini file (not just simba.sparkodbc.ini):&lt;/P&gt;
&lt;P&gt;[YourDatabricksDSN]&lt;BR /&gt;Driver=/opt/simba/spark/lib/64/libsparkodbc_sb64.so&lt;BR /&gt;Host=your-workspace.cloud.databricks.com&lt;BR /&gt;Port=443&lt;BR /&gt;HTTPPath=/sql/1.0/warehouses/your-warehouse-id&lt;BR /&gt;SSL=1&lt;BR /&gt;ThriftTransport=2&lt;BR /&gt;AuthMech=3&lt;BR /&gt;UID=token&lt;BR /&gt;PWD=your-personal-access-token&lt;BR /&gt;DriverUnicodeType=1&lt;/P&gt;
&lt;P&gt;2. Also try adding StringColumnLength to limit the reported column size. Informatica allocates memory based on the column metadata returned by the driver. If the driver reports very large string columns, Informatica may overflow its internal buffer. Add this to your DSN:&lt;/P&gt;
&lt;P&gt;StringColumnLength=32768&lt;/P&gt;
&lt;P&gt;You can lower this further if your data does not have very long strings (e.g., 4096 or 16384).&lt;/P&gt;
&lt;P&gt;3. Set the locale and encoding environment variables before starting Informatica. Make sure your Linux session uses UTF-8:&lt;/P&gt;
&lt;P&gt;export LANG=en_US.UTF-8&lt;BR /&gt;export LC_ALL=en_US.UTF-8&lt;/P&gt;
&lt;P&gt;4. In Informatica PowerCenter, check your session properties for the connection. There is typically an option to set the "Code Page" or "Connection Code Page" on the ODBC connection object. Make sure it is set to UTF-8 (65001) rather than a Latin/ANSI code page.&lt;/P&gt;
&lt;P&gt;5. If the above still does not resolve it, try adding these additional parameters to your odbc.ini DSN section:&lt;/P&gt;
&lt;P&gt;UseUnicodeSqlCharacterTypes=1&lt;BR /&gt;DefaultStringColumnLength=4096&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;ISSUE 2: GLIBC_2.27 REQUIREMENT WITH DRIVER 2.9.2&lt;/P&gt;
&lt;P&gt;You are correct that newer versions of the Simba Spark ODBC driver (roughly version 2.8.x and above) require GLIBC 2.27 or later, which means RHEL 8 or newer. RHEL 7 ships with GLIBC 2.17, which is why you are seeing that error.&lt;/P&gt;
&lt;P&gt;Here are your options:&lt;/P&gt;
&lt;P&gt;Option A: Use an older driver version that supports RHEL 7&lt;/P&gt;
&lt;P&gt;The Simba Spark ODBC driver versions 2.7.x and earlier typically support GLIBC 2.17 (RHEL 7). You can download archived driver versions from the Databricks ODBC Drivers archive page (you will need to sign in to your Databricks account):&lt;/P&gt;
&lt;P&gt;&lt;A href="https://www.databricks.com/spark/odbc-drivers-archive" target="_blank"&gt;https://www.databricks.com/spark/odbc-drivers-archive&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Look for version 2.7.7 or 2.7.5 -- these should install and run on RHEL 7 without the GLIBC issue. The older driver still supports connecting to Databricks SaaS workspaces. Make sure to use the correct authentication (personal access token or OAuth) and set the Host, Port, HTTPPath, and SSL parameters for your SaaS workspace.&lt;/P&gt;
&lt;P&gt;Option B: Use the Databricks JDBC driver instead of ODBC&lt;/P&gt;
&lt;P&gt;If Informatica PowerCenter on your system supports JDBC connections (which it does), you can switch to the Databricks JDBC driver as a workaround. The JDBC driver is a pure Java library with no native OS dependencies, so it runs on any Linux version with a supported JRE. You can download it here:&lt;/P&gt;
&lt;P&gt;&lt;A href="https://docs.databricks.com/en/integrations/jdbc-oss/" target="_blank"&gt;https://docs.databricks.com/en/integrations/jdbc-oss/&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;This avoids the GLIBC issue entirely and also typically avoids the Unicode buffer overflow since Java handles string encoding natively.&lt;/P&gt;
&lt;P&gt;Option C: Use a container or separate jump box&lt;/P&gt;
&lt;P&gt;If neither of the above works, you could set up a lightweight RHEL 8 or Ubuntu container (using Docker or Podman) on your Informatica server and run the ODBC connection through it. This lets you use the latest driver without upgrading the host OS.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;RECOMMENDED APPROACH FOR YOUR SITUATION&lt;/P&gt;
&lt;P&gt;Given that you need this to work for only a few months (March through June/July), I would suggest:&lt;/P&gt;
&lt;P&gt;1. First, try the Unicode configuration fixes above (DriverUnicodeType=1, StringColumnLength, code page settings) with your current driver. The difference between PVC and SaaS connectivity may just be a metadata difference in how the SaaS SQL warehouse reports column sizes.&lt;/P&gt;
&lt;P&gt;2. If that does not resolve the Unicode error, download an older driver version (2.7.x) from the archive that is compatible with RHEL 7 and GLIBC 2.17.&lt;/P&gt;
&lt;P&gt;3. If you want the cleanest path, consider switching to the JDBC driver in Informatica, which eliminates both the GLIBC dependency and the Unicode encoding mismatch.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;KEY DIFFERENCE BETWEEN PVC AND SAAS&lt;/P&gt;
&lt;P&gt;The reason this may have worked on PVC but not SaaS is likely related to differences in the SQL endpoint/warehouse configuration or TLS/SSL handling between PVC and SaaS environments. SaaS workspaces may also return slightly different column metadata (particularly for string column lengths) that triggers the buffer allocation mismatch in Informatica. The configuration tweaks above should account for these differences.&lt;/P&gt;
&lt;P&gt;I hope this helps you bridge the gap. If you can share which specific Simba ODBC driver version is currently installed and working on your PVC connection, that would help narrow things down further -- you could potentially use that same version pointed at your SaaS workspace endpoint.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 04:39:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unicode-converter-buffer-overflow-error/m-p/150127#M53257</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T04:39:50Z</dc:date>
    </item>
  </channel>
</rss>

