cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unicode field separator to create unamanged table in databricks for csv file

RajibRajib_Mand
New Contributor III

We are getting 

\u318a (ㆊ)

 separated csv file. We want to create unmanaged table in databricks, Here is the table creation script.

create table IF NOT EXISTS db_test_raw.t_data_otc_poc

(`caseidt` String,

`worktype` String,

`doctyp` String,

`brand` String,

`reqemailid` String,

`subprocess` String,

`accountname` String,

`location` String,

`lineitems` String,

`emailsubject` String,

`createddate` string,

`process` String,

`archivalbatchid` String,

`createddt` String,

`customername` String,

`invoicetype` String,

`month` String,

`payernumber` String,

`sapaccountnumber` String,SOURCE_BUSINESS_DATE Date ) USING

CSV OPTIONS (header 'true',encoding 'UTF-8',quote '"', escape '"',delimiter '\u318a', path

'abfss://xxxx@yyyyy.dfs.core.windows.net/Raw/OPERATIONS/BUSINESSSERVICES/***/xx_DATA_OTC')

PARTITIONED BY (SOURCE_BUSINESS_DATE )

The table created successfully in databricks.

While checking (

describe table extended db_test_raw.t_data_otc_poc

 ), we found storage properties as [encoding=UTF-8, quote=", escape=", header=true, delimiter=?] .The delimiter got changed.

Can you please let us know what went wrong here?

Data is also loaded into first columns and value for the rest of the column is null

7 REPLIES 7

RajibRajib_Mand
New Contributor III

Also all data loaded into single coumn.Value of the other column stored as null

Hubert-Dudek
Esteemed Contributor III

sep "\u318a"

delimeter " \x318a"

sep " \x318a"

Try to use sep instead or/and x instead.

Thanks @Hubert Dudek​ for your response. I tried with these options. Unfortunately it did not work

Have you try to use "multiline" ? also try to read it using CSV to validate, then you can create the table, after you validate the data is correct.

For example:

df = spark.read

.option("header",true)

.option("multiLine",true)

.option("escape","_especial_value_")

.csv("path_to_CSV_data")

Hi @Rajib Rajib Mandal​ ,

Just a friendly follow-up. Do you still need help with this questions or not anymore? did any of our responses help to resolved the issue? if yes, please mark it as best.

Hi @Jose Gonzalez ,

Yes .I still need help .No one response yet.

Regards,

Rajib

Hi @Rajib Rajib Mandal​,

What have you tried so far? It will hep us to narrow down the scope. Please share as much details as possible.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group