Databricks Community

shamly · ‎01-09-2023

I have tried several code and nothing worked. An extra space or line LF is going to next row in my output. All rows are ending in CRLF, but some rows end in LF and while reading the csv, it is not giving correct output. My csv have double dagger as delimitter

csv looks like this

‡‡Id‡‡,‡‡Version‡‡,‡‡Questionnaire‡‡,‡‡Date‡‡

‡‡123456‡‡,‡‡Version2‡‡,‡‡All questions have been answered accurately

and the guidance in the questionnaire was understood and followed‡‡,‡‡2010-12-16 00:01:48.020000000‡‡

I tried below code

dff = spark.read.option("header", "true") \

.option("inferSchema", "true") \

.option('encoding', 'UTF-16') \

.option("delimiter", "‡‡,‡‡") \

.option("multiLine", True) \

.csv("/mnt/path/data.csv")

dffs_headers = dff.dtypes

display(dff)

for i in dffs_headers:

columnLabel = i[0]

newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')

dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$',''))

if columnLabel != newColumnLabel:

dff = dff.drop(columnLabel)

display(dff)

Can I use regex replace .regexp_replace('?<!\r)\n','') but how and where ?

Please help @ArunKumar-Databricks @Gustavo Barreto @ANUJ GARG @

RaghavendraY · ‎01-10-2023

Can you share a sample file with rows ending in CRLF, and in LF

Chaitanya_Raju · ‎01-10-2023

Hi @shamly pt ,

Can you please share the sample file with the ***** data and also the expected output, so that we can try it at our end and let you know.

Happy Learning!!

Thanks for reading and like if this is useful and for improvements or feedback please comment.

sher · ‎01-11-2023

hi

import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc);
 
val df = sqlContext.read.format("csv")
            .option("header", "true")
            .option("delimiter", "your delimiter")
            .option("inferSchema",true")
            .load("csv file")

can you try this. if this not work

then you need to read the file in RDD and convert to df and write back to CSV

CSV --> RDD --> DF --> FINAL_OUTPUT format

sher · ‎01-11-2023

val df = spark.read.format("csv")
              .option("header",true)
                .option("sep","||")
                  .load("file load")
display(df)  
 
try this

Databricks Community

How to replace LF and replace with ' ' in csv UTF-16 encoded?

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! December 05 – 11, 2025

Jaipur Usergroup First Virtual Meetup: AI/BI Genie + Data Science Careers — 19 Dec | 6 PM IST

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐