cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to replace LF and replace with ' ' in csv UTF-16 encoded?

shamly
New Contributor III

I have tried several code and nothing worked. An extra space or line LF is going to next row in my output. All rows are ending in CRLF, but some rows end in LF and while reading the csv, it is not giving correct output. My csv have double dagger as delimitter

csv looks like this

‡‡Id‡‡,‡‡Version‡‡,‡‡Questionnaire‡‡,‡‡Date‡‡

‡‡123456‡‡,‡‡Version2‡‡,‡‡All questions have been answered accurately

and the guidance in the questionnaire was understood and followed‡‡,‡‡2010-12-16 00:01:48.020000000‡‡

I tried below code

dff = spark.read.option("header", "true") \

.option("inferSchema", "true") \

.option('encoding', 'UTF-16') \

.option("delimiter", "‡‡,‡‡") \

.option("multiLine", True) \

.csv("/mnt/path/data.csv")

dffs_headers = dff.dtypes

display(dff)

for i in dffs_headers:

  columnLabel = i[0]

  newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')

  dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$',''))

  if columnLabel != newColumnLabel:

    dff = dff.drop(columnLabel)

    

display(dff)

Can I use regex replace .regexp_replace('?<!\r)\n','') but how and where ?

Please help @ArunKumar-Databricks​ @Gustavo Barreto​ @ANUJ GARG​ @

4 REPLIES 4

RaghavendraY
New Contributor III

Can you share a sample file with rows ending in CRLF, and in LF

Chaitanya_Raju
Honored Contributor

Hi @shamly pt​ ,

Can you please share the sample file with the ***** data and also the expected output, so that we can try it at our end and let you know.

Happy Learning!!

Thanks for reading and like if this is useful and for improvements or feedback please comment.

sher
Valued Contributor II

hi

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc);
 
val df = sqlContext.read.format("csv")
            .option("header", "true")
            .option("delimiter", "your delimiter")
            .option("inferSchema",true")
            .load("csv file")

can you try this. if this not work

then you need to read the file in RDD and convert to df and write back to CSV

CSV --> RDD --> DF --> FINAL_OUTPUT format

sher
Valued Contributor II
val df = spark.read.format("csv")
              .option("header",true)
                .option("sep","||")
                  .load("file load")
display(df)  
 
try this

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group