Unicode converter buffer overflow error.

kmcas10 — Thu, 12 Feb 2026 23:46:37 GMT

We are currently using Informatica Powercenter and pulling down data from Databricks PVC using an ODBC connection and its been working great. Our company is moving to Databricks SaaS and I am trying to get Informatica Powercenter to connect to SaaS and pull down data using ODBC as well. The problem is with SaaS we keep getting a unicode converter buffer overflow error.

I have seen where we can add an entry in our ODBC file (DriverUnicodeType=1) but it did not resolve the error. Our Informatica server is a Linux box with RHEL 7. We tried using the Databricks ODBC driver version 2.9.2 but get this error when trying to install:
version `GLIBC_2.27' not found (required by /opt/simba/spark/lib/64/libsparkodbc_sb64.so)

All I have found on this error is that the Linux OS needs to be upgraded. Our company is currently using Databricks PVC but everyone is being migrated to SaaS by March. They are not willing to upgrade the Informatica Linux OS because it will be sunset once our group moves to SaaS but the issue is that we are not doing that until June/July so there is a gap of a few months where w still need Informatica to work and pull from SaaS.

Are there any other options we can try to get our Informatica ODBC connection to work?

Re: Unicode converter buffer overflow error.

SteveOstrowski — Sun, 08 Mar 2026 04:39:50 GMT

Hi @kmcas10,

Let me share some guidance on this. this is a scenario that comes up when transitioning from Databricks PVC (private cloud) to SaaS with legacy ODBC tooling, and there are several things you can try to bridge the gap until your full migration.

Let me break down the two issues you are dealing with and the options for each.

ISSUE 1: THE UNICODE CONVERTER BUFFER OVERFLOW ERROR

This error typically occurs when there is an encoding mismatch between how the Databricks ODBC driver returns string data and how Informatica PowerCenter expects to receive it. The Simba Spark ODBC driver on Linux defaults to UTF-32 encoding, while Informatica PowerCenter often expects UTF-16. When the 4-byte-per-character UTF-32 data is received into a buffer sized for 2-byte-per-character UTF-16, you get the buffer overflow.

You mentioned trying DriverUnicodeType=1 in the ODBC configuration, which is the right direction. Here are the full set of options to try:

1. Confirm DriverUnicodeType is set correctly in the right location. This setting should go in the DSN section of your odbc.ini file (not just simba.sparkodbc.ini):

[YourDatabricksDSN]
Driver=/opt/simba/spark/lib/64/libsparkodbc_sb64.so
Host=your-workspace.cloud.databricks.com
Port=443
HTTPPath=/sql/1.0/warehouses/your-warehouse-id
SSL=1
ThriftTransport=2
AuthMech=3
UID=token
PWD=your-personal-access-token
DriverUnicodeType=1

2. Also try adding StringColumnLength to limit the reported column size. Informatica allocates memory based on the column metadata returned by the driver. If the driver reports very large string columns, Informatica may overflow its internal buffer. Add this to your DSN:

StringColumnLength=32768

You can lower this further if your data does not have very long strings (e.g., 4096 or 16384).

3. Set the locale and encoding environment variables before starting Informatica. Make sure your Linux session uses UTF-8:

export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8

4. In Informatica PowerCenter, check your session properties for the connection. There is typically an option to set the "Code Page" or "Connection Code Page" on the ODBC connection object. Make sure it is set to UTF-8 (65001) rather than a Latin/ANSI code page.

5. If the above still does not resolve it, try adding these additional parameters to your odbc.ini DSN section:

UseUnicodeSqlCharacterTypes=1
DefaultStringColumnLength=4096

ISSUE 2: GLIBC_2.27 REQUIREMENT WITH DRIVER 2.9.2

You are correct that newer versions of the Simba Spark ODBC driver (roughly version 2.8.x and above) require GLIBC 2.27 or later, which means RHEL 8 or newer. RHEL 7 ships with GLIBC 2.17, which is why you are seeing that error.

Here are your options:

Option A: Use an older driver version that supports RHEL 7

The Simba Spark ODBC driver versions 2.7.x and earlier typically support GLIBC 2.17 (RHEL 7). You can download archived driver versions from the Databricks ODBC Drivers archive page (you will need to sign in to your Databricks account):

https://www.databricks.com/spark/odbc-drivers-archive

Look for version 2.7.7 or 2.7.5 -- these should install and run on RHEL 7 without the GLIBC issue. The older driver still supports connecting to Databricks SaaS workspaces. Make sure to use the correct authentication (personal access token or OAuth) and set the Host, Port, HTTPPath, and SSL parameters for your SaaS workspace.

Option B: Use the Databricks JDBC driver instead of ODBC

If Informatica PowerCenter on your system supports JDBC connections (which it does), you can switch to the Databricks JDBC driver as a workaround. The JDBC driver is a pure Java library with no native OS dependencies, so it runs on any Linux version with a supported JRE. You can download it here:

https://docs.databricks.com/en/integrations/jdbc-oss/

This avoids the GLIBC issue entirely and also typically avoids the Unicode buffer overflow since Java handles string encoding natively.

Option C: Use a container or separate jump box

If neither of the above works, you could set up a lightweight RHEL 8 or Ubuntu container (using Docker or Podman) on your Informatica server and run the ODBC connection through it. This lets you use the latest driver without upgrading the host OS.

RECOMMENDED APPROACH FOR YOUR SITUATION

Given that you need this to work for only a few months (March through June/July), I would suggest:

1. First, try the Unicode configuration fixes above (DriverUnicodeType=1, StringColumnLength, code page settings) with your current driver. The difference between PVC and SaaS connectivity may just be a metadata difference in how the SaaS SQL warehouse reports column sizes.

2. If that does not resolve the Unicode error, download an older driver version (2.7.x) from the archive that is compatible with RHEL 7 and GLIBC 2.17.

3. If you want the cleanest path, consider switching to the JDBC driver in Informatica, which eliminates both the GLIBC dependency and the Unicode encoding mismatch.

KEY DIFFERENCE BETWEEN PVC AND SAAS

The reason this may have worked on PVC but not SaaS is likely related to differences in the SQL endpoint/warehouse configuration or TLS/SSL handling between PVC and SaaS environments. SaaS workspaces may also return slightly different column metadata (particularly for string column lengths) that triggers the buffer allocation mismatch in Informatica. The configuration tweaks above should account for these differences.

I hope this helps you bridge the gap. If you can share which specific Simba ODBC driver version is currently installed and working on your PVC connection, that would help narrow things down further -- you could potentially use that same version pointed at your SaaS workspace endpoint.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

topic Unicode converter buffer overflow error. in Data Engineering

Unicode converter buffer overflow error.

Re: Unicode converter buffer overflow error.