cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Unicode field separator to create unamanged table in databricks for csv file

RajibRajib_Mand
New Contributor III

We are getting 

\u318a (ㆊ)

 separated csv file. We want to create unmanaged table in databricks, Here is the table creation script.

create table IF NOT EXISTS db_test_raw.t_data_otc_poc

(`caseidt` String,

`worktype` String,

`doctyp` String,

`brand` String,

`reqemailid` String,

`subprocess` String,

`accountname` String,

`location` String,

`lineitems` String,

`emailsubject` String,

`createddate` string,

`process` String,

`archivalbatchid` String,

`createddt` String,

`customername` String,

`invoicetype` String,

`month` String,

`payernumber` String,

`sapaccountnumber` String,SOURCE_BUSINESS_DATE Date ) USING

CSV OPTIONS (header 'true',encoding 'UTF-8',quote '"', escape '"',delimiter '\u318a', path

'abfss://xxxx@yyyyy.dfs.core.windows.net/Raw/OPERATIONS/BUSINESSSERVICES/***/xx_DATA_OTC')

PARTITIONED BY (SOURCE_BUSINESS_DATE )

The table created successfully in databricks.

While checking (

describe table extended db_test_raw.t_data_otc_poc

 ), we found storage properties as [encoding=UTF-8, quote=", escape=", header=true, delimiter=?] .The delimiter got changed.

Can you please let us know what went wrong here?

Data is also loaded into first columns and value for the rest of the column is null

7 REPLIES 7

RajibRajib_Mand
New Contributor III

Also all data loaded into single coumn.Value of the other column stored as null

Hubert-Dudek
Esteemed Contributor III

sep "\u318a"

delimeter " \x318a"

sep " \x318a"

Try to use sep instead or/and x instead.

Thanks @Hubert Dudek​ for your response. I tried with these options. Unfortunately it did not work

Have you try to use "multiline" ? also try to read it using CSV to validate, then you can create the table, after you validate the data is correct.

For example:

df = spark.read

.option("header",true)

.option("multiLine",true)

.option("escape","_especial_value_")

.csv("path_to_CSV_data")

Hi @Rajib Rajib Mandal​ ,

Just a friendly follow-up. Do you still need help with this questions or not anymore? did any of our responses help to resolved the issue? if yes, please mark it as best.

Hi @Jose Gonzalez ,

Yes .I still need help .No one response yet.

Regards,

Rajib

Hi @Rajib Rajib Mandal​,

What have you tried so far? It will hep us to narrow down the scope. Please share as much details as possible.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.