04-03-2024 09:09 AM
Hi Everyone.
I am trying to connect and read data from the Databricks table using SQL Warehouse and return it using Azure API.
However, the non-English characters, for example, 'Ä', are present in the response as following: ��.
I am using the databricks-jdbc driver of the latest version.
I have tried to resolve it by setting the System properties as:
System.setProperty("file.encoding", "UTF-8");
System.setProperty("sun.jnu.encoding", "UTF-8");
Another thing that I tried was changing the connection string to contain:
useUnicode=true;characterEncoding=UTF-8
However, this causes the exception:
Internal Server Error: [Databricks][DatabricksJDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, infoMessages:[*org.apache.hive.service.cli.HiveSQLException:Configuration useUnicode is not available
04-08-2024 04:21 AM
Hi @Kaniz_Fatma
I was able to resolve the issue by changing the approach of setting system properties from the code itself at the start of the execution to propagating them to the Azure Function environment variables in JAVA_OPTS. This way the JVM is instantiated already with the proper configuration.
Thanks a lot
04-05-2024 04:02 AM
Hi @crankerkor,
JDBC Driver Configuration:
System Properties:
file.encoding
and sun.jnu.encoding
properties to UTF-8.System.setProperty("file.encoding", "UTF-8");
System.setProperty("sun.jnu.encoding", "UTF-8");
JDBC Connection String:
useUnicode=true
: This ensures that Unicode characters are handled correctly.characterEncoding=UTF-8
: Specifies the character encoding.jdbc:databricks://<hostname>:443/default;useUnicode=true;characterEncoding=UTF-8
Charset Auto-Detection:
encoding
option:
%scala
option("encoding", "UTF-16LE")
Database Collation:
Parquet Files:
Remember that handling character encoding involves coordination between your ETL tool, Databricks, and the JDBC driver. If you’ve tried all the steps above and still face issues, consider reaching out to Databricks support or community forums for further assistance.
References:
04-08-2024 04:21 AM
Hi @Kaniz_Fatma
I was able to resolve the issue by changing the approach of setting system properties from the code itself at the start of the execution to propagating them to the Azure Function environment variables in JAVA_OPTS. This way the JVM is instantiated already with the proper configuration.
Thanks a lot
07-09-2024 04:32 PM
If Databricks support/Product managers follow the forum, suggest you review the SIMBA provided docs.
It does not discuss the name value pairs mentioned re utf and encoding.
https://www.databricks.com/spark/jdbc-drivers-download
There are other gaps in the SIMBA docs re name-value pairs including PreparedMetadataLimitZero
07-09-2024 04:43 PM
If Databricks support/Product Management following the forum, note that PDF from SIMBA in 2.6.28 does not discuss the name-value pairs in the above solution.
Other errata includes PreparedMetadataLimitZero.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group