cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

German Umlauts wrong via JDBC

jordan72
New Contributor III

Hi,

I have the issue that German Umlauts are not getting retrieved correctly via the JDBC driver.

It shows M�nchen instead of München.

I load the driver in my java app via:

<groupId>com.databricks</groupId>
<artifactId>databricks-jdbc</artifactId>
<version>2.7.3</version>

and set the charsets via:

System.setProperty("file.encoding", "UTF-8");
System.setProperty("sun.jnu.encoding", "UTF-8");

In the Databricks UI everything look correctly. The column type is STRING.

Regards

Volker Jordan

2 ACCEPTED SOLUTIONS

Accepted Solutions

jordan72
New Contributor III

ok, so it seems that it has something to do with the newly introduced native.encoding system property.

So In Netbeans you have to provide -Dstdout.encoding=utf-8 to the vm if you are using JDK21.

View solution in original post

szymon_dybczak
Esteemed Contributor III

Yes, this is exactly what the link I provided above suggested:

szymon_dybczak_0-1751298256641.png

 



View solution in original post

8 REPLIES 8

szymon_dybczak
Esteemed Contributor III

Hi @jordan72 ,

Maybe try to add to your jdbc connection url following parameters: 

CharacterEncoding=UTF-8;

- UseUnicode=true;

- CharSet=UTF-8;

String url = "jdbc:databricks://<your-host>:443/default;transportMode=http;ssl=1;httpPath=<http-path>;AuthMech=3;UID=token;PWD=<token>;CharSet=UTF-8;characterEncoding=UTF-8;UseUnicode=true;";

 

I already tried all those parameters, but nothing changed.

Surprisingly, in DataGrip (which also used the JDBC driver) the results are correct. And I copied the url from DataGrip into a raw Java IDE, and here it does not work.

szymon_dybczak
Esteemed Contributor III

Ok, thanks for additional information. So maybe the issue is somehow related to JVM environment. 
I noticed that you're setting following property: System.setProperty("file.encoding", "UTF-8");
Java sets file.encoding once at JVM startup — setting it with System.setProperty at runtime has no effect on string decoding in most libraries, including JDBC drivers.

Try to launch your application with following  VM option. 


java -Dfile.encoding=UTF-8

 

Another thought, you can check if this is not problem with your IDE configuration. Assuming you're using Intellij, then check your file encodings settings: Settings -> Editior -> File encodings. 

Use the UTF-8, Luke! File Encodings in IntelliJ IDEA | The IntelliJ IDEA Blog

jordan72
New Contributor III

hm, now its getting even more weird. I usually use NetbeansIDE. I now tried the same code with Eclipse and here it worked without any special options. In Netbeans, even with 

-Dfile.encoding=UTF-8

there is no change. Does anyone know what can lead Netbeans to this behaviour ?

szymon_dybczak
Esteemed Contributor III

Ok, so that only confirms that this problem is not related to driver. Rather, this is weird quirk of Netbeans.
In netbeans it's not sufficient to use only option -Dfile.encoding=UTF-8. 
Please follow approach suggested in following stackoverflow thread, depending on Java version you're using

java - How to use UTF-8 character in Netbeans - Stack Overflow

jordan72
New Contributor III

ok, so it seems that it has something to do with the newly introduced native.encoding system property.

So In Netbeans you have to provide -Dstdout.encoding=utf-8 to the vm if you are using JDK21.

szymon_dybczak
Esteemed Contributor III

Yes, this is exactly what the link I provided above suggested:

szymon_dybczak_0-1751298256641.png

 



Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now