- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 06:49 AM
Data from external source is copied to ADLS, which further gets picked up by databricks, then this massaged data is put in the outbound file . A special character ? (question mark in black diamond) is seen in some fields in outbound file which may break existing code is not identified.
- Labels:
-
Azure databricks
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 08:38 AM
This needs encoding. you can try encoding the output while reading the file.
.option("encoding", "UTF-16LE")
Please refer to the below:
https://docs.microsoft.com/en-us/azure/databricks/kb/data-sources/json-unicode
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 07:03 AM
Hi @Jazmine Kochan , what type of data is being copied? Does the data have any Unicode characters or symbols like ç ã,...?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 07:28 AM
Hi Prabakar,
Thanks for promt response.
It is a text file with customer data.
I have not seen such characters in the data but in text entry fields, this kind of data could be entered by client.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 07:44 AM
So yes, text could contain such characters.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 07:51 AM
So the cause of the issue is those Unicode characters. I believe there should be a fix for this. I shall check and get back here.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 07:58 AM
Thanks much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 08:29 AM
Hi Prabakar
Could it be developer's code - which could be adding this special character?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 08:38 AM
This needs encoding. you can try encoding the output while reading the file.
.option("encoding", "UTF-16LE")
Please refer to the below:
https://docs.microsoft.com/en-us/azure/databricks/kb/data-sources/json-unicode
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-10-2021 01:30 PM
Do i need to encode and decode too?? Currently incorrect data is displayed @Prabakar Ammeappin
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 08:04 AM
Are you sure it is Databricks which puts the special character in place?
It could also have happened during the copy of the external system to ADLS.
If you use Azure Data Factory f.e. you have to define the encoding (UTF-8 or UTF-16, ...)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2021 08:15 AM
Hi
Yes we checked all the files in the flow. It is output file from Databricks in which question mark character is seen at beginning of some lines in text fields.

