cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to replace LF and replace with ' ' in csv UTF-16 encoded?

shamly
New Contributor III

I have tried several code and nothing worked. An extra space or line LF is going to next row in my output. All rows are ending in CRLF, but some rows end in LF and while reading the csv, it is not giving correct output. My csv have double dagger as delimitter

csv looks like this

‡‡Id‡‡,‡‡Version‡‡,‡‡Questionnaire‡‡,‡‡Date‡‡

‡‡123456‡‡,‡‡Version2‡‡,‡‡All questions have been answered accurately

and the guidance in the questionnaire was understood and followed‡‡,‡‡2010-12-16 00:01:48.020000000‡‡

I tried below code

dff = spark.read.option("header", "true") \

.option("inferSchema", "true") \

.option('encoding', 'UTF-16') \

.option("delimiter", "‡‡,‡‡") \

.option("multiLine", True) \

.csv("/mnt/path/data.csv")

dffs_headers = dff.dtypes

display(dff)

for i in dffs_headers:

  columnLabel = i[0]

  newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')

  dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$',''))

  if columnLabel != newColumnLabel:

    dff = dff.drop(columnLabel)

    

display(dff)

Can I use regex replace .regexp_replace('?<!\r)\n','') but how and where ?

Please help @ArunKumar-Databricks​ @Gustavo Barreto​ @ANUJ GARG​ @

5 REPLIES 5

poet_RY
New Contributor III

Can you share a sample file with rows ending in CRLF, and in LF

Chaitanya_Raju
Honored Contributor

Hi @shamly pt​ ,

Can you please share the sample file with the ***** data and also the expected output, so that we can try it at our end and let you know.

Happy Learning!!

sher
Valued Contributor II

hi

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc);
 
val df = sqlContext.read.format("csv")
            .option("header", "true")
            .option("delimiter", "your delimiter")
            .option("inferSchema",true")
            .load("csv file")

can you try this. if this not work

then you need to read the file in RDD and convert to df and write back to CSV

CSV --> RDD --> DF --> FINAL_OUTPUT format

sher
Valued Contributor II
val df = spark.read.format("csv")
              .option("header",true)
                .option("sep","||")
                  .load("file load")
display(df)  
 
try this

Kaniz
Community Manager
Community Manager

Hi @shamly pt​(Customer)​ , We haven’t heard from you since the last response from @sherbin w​ ​ and @Ratna Chaitanya Raju Bandaru​​, and I was checking back to see if their suggestions helped you.

Or else, If you have any solution, please do share that with the community as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.