Unicode field separator to create unamanged table in databricks for csv file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-24-2022 12:18 AM
We are getting
\u318a (ㆊ)
separated csv file. We want to create unmanaged table in databricks, Here is the table creation script.
create table IF NOT EXISTS db_test_raw.t_data_otc_poc
(`caseidt` String,
`worktype` String,
`doctyp` String,
`brand` String,
`reqemailid` String,
`subprocess` String,
`accountname` String,
`location` String,
`lineitems` String,
`emailsubject` String,
`createddate` string,
`process` String,
`archivalbatchid` String,
`createddt` String,
`customername` String,
`invoicetype` String,
`month` String,
`payernumber` String,
`sapaccountnumber` String,SOURCE_BUSINESS_DATE Date ) USING
CSV OPTIONS (header 'true',encoding 'UTF-8',quote '"', escape '"',delimiter '\u318a', path
'abfss://xxxx@yyyyy.dfs.core.windows.net/Raw/OPERATIONS/BUSINESSSERVICES/***/xx_DATA_OTC')
PARTITIONED BY (SOURCE_BUSINESS_DATE )
The table created successfully in databricks.
While checking (
describe table extended db_test_raw.t_data_otc_poc
), we found storage properties as [encoding=UTF-8, quote=", escape=", header=true, delimiter=?] .The delimiter got changed.
Can you please let us know what went wrong here?
Data is also loaded into first columns and value for the rest of the column is null
- Labels:
-
String
-
Unicode
-
Unicode Field Separator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-24-2022 12:21 AM
Also all data loaded into single coumn.Value of the other column stored as null
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-24-2022 04:02 AM
sep "\u318a"
delimeter " \x318a"
sep " \x318a"
Try to use sep instead or/and x instead.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-24-2022 09:50 AM
Thanks @Hubert Dudek for your response. I tried with these options. Unfortunately it did not work
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-25-2022 02:33 PM
Have you try to use "multiline" ? also try to read it using CSV to validate, then you can create the table, after you validate the data is correct.
For example:
df = spark.read
.option("header",true)
.option("multiLine",true)
.option("escape","_especial_value_")
.csv("path_to_CSV_data")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-10-2022 10:17 AM
Hi @Rajib Rajib Mandal ,
Just a friendly follow-up. Do you still need help with this questions or not anymore? did any of our responses help to resolved the issue? if yes, please mark it as best.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-10-2022 08:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-13-2022 02:16 PM
Hi @Rajib Rajib Mandal,
What have you tried so far? It will hep us to narrow down the scope. Please share as much details as possible.

