<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to replace LF and replace with ' ' in csv UTF-16 encoded? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12869#M7624</link>
    <description>&lt;P&gt;Hi @shamly pt​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can you please share the sample file with the ***** data and also the expected output, so that we can try it at our end and let you know.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Happy Learning!!&lt;/P&gt;</description>
    <pubDate>Tue, 10 Jan 2023 13:59:55 GMT</pubDate>
    <dc:creator>Chaitanya_Raju</dc:creator>
    <dc:date>2023-01-10T13:59:55Z</dc:date>
    <item>
      <title>How to replace LF and replace with ' ' in csv UTF-16 encoded?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12867#M7622</link>
      <description>&lt;P&gt;I have tried several code and nothing worked. An extra space or line LF is going to next row in my output. All rows are ending in CRLF, but some rows end in LF and while reading the csv, it is not giving correct output. My csv have double dagger as delimitter &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;csv looks like this&lt;/P&gt;&lt;P&gt;‡‡Id‡‡,‡‡Version‡‡,‡‡Questionnaire‡‡,‡‡Date‡‡&lt;/P&gt;&lt;P&gt;‡‡123456‡‡,‡‡Version2‡‡,‡‡All questions have been answered accurately&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and the guidance in the questionnaire was understood and followed‡‡,‡‡2010-12-16 00:01:48.020000000‡‡&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I tried below code&lt;/P&gt;&lt;P&gt;dff = spark.read.option("header", "true") \&lt;/P&gt;&lt;P&gt;.option("inferSchema", "true") \&lt;/P&gt;&lt;P&gt;.option('encoding', 'UTF-16') \&lt;/P&gt;&lt;P&gt;.option("delimiter", "‡‡,‡‡") \&lt;/P&gt;&lt;P&gt;.option("multiLine", True) \&lt;/P&gt;&lt;P&gt;.csv("/mnt/path/data.csv")&lt;/P&gt;&lt;P&gt;dffs_headers = dff.dtypes&lt;/P&gt;&lt;P&gt;display(dff)&lt;/P&gt;&lt;P&gt;for i in dffs_headers:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;columnLabel = i[0]&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;newColumnLabel = columnLabel.replace('‡‡','').replace('‡‡','')&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;dff=dff.withColumn(newColumnLabel,regexp_replace(columnLabel,'^\\‡‡|\\‡‡$',''))&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;if columnLabel != newColumnLabel:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;dff = dff.drop(columnLabel)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;display(dff)&lt;/P&gt;&lt;P&gt;Can I use regex replace .regexp_replace('?&amp;lt;!\r)\n','') but how and where ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Please help @ArunKumar-Databricks​&amp;nbsp;@Gustavo Barreto​&amp;nbsp;@ANUJ GARG​&amp;nbsp;@&lt;/P&gt;</description>
      <pubDate>Mon, 09 Jan 2023 19:48:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12867#M7622</guid>
      <dc:creator>shamly</dc:creator>
      <dc:date>2023-01-09T19:48:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to replace LF and replace with ' ' in csv UTF-16 encoded?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12868#M7623</link>
      <description>&lt;P&gt;Can you share a sample file with rows ending in CRLF, and in LF&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jan 2023 12:48:28 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12868#M7623</guid>
      <dc:creator>RaghavendraY</dc:creator>
      <dc:date>2023-01-10T12:48:28Z</dc:date>
    </item>
    <item>
      <title>Re: How to replace LF and replace with ' ' in csv UTF-16 encoded?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12869#M7624</link>
      <description>&lt;P&gt;Hi @shamly pt​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Can you please share the sample file with the ***** data and also the expected output, so that we can try it at our end and let you know.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Happy Learning!!&lt;/P&gt;</description>
      <pubDate>Tue, 10 Jan 2023 13:59:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12869#M7624</guid>
      <dc:creator>Chaitanya_Raju</dc:creator>
      <dc:date>2023-01-10T13:59:55Z</dc:date>
    </item>
    <item>
      <title>Re: How to replace LF and replace with ' ' in csv UTF-16 encoded?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12870#M7625</link>
      <description>&lt;P&gt;hi&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc);
&amp;nbsp;
val df = sqlContext.read.format("csv")
            .option("header", "true")
            .option("delimiter", "your delimiter")
            .option("inferSchema",true")
            .load("csv file")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;can you try this. if this not work&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;then you need to read the file in RDD and convert to df and write back to CSV&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;CSV --&amp;gt; RDD --&amp;gt; DF --&amp;gt; FINAL_OUTPUT format&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2023 17:37:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12870#M7625</guid>
      <dc:creator>sher</dc:creator>
      <dc:date>2023-01-11T17:37:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to replace LF and replace with ' ' in csv UTF-16 encoded?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12871#M7626</link>
      <description>&lt;PRE&gt;&lt;CODE&gt;val df = spark.read.format("csv")
              .option("header",true)
                .option("sep","||")
                  .load("file load")
display(df)  
&amp;nbsp;
try this&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Jan 2023 17:39:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-replace-lf-and-replace-with-in-csv-utf-16-encoded/m-p/12871#M7626</guid>
      <dc:creator>sher</dc:creator>
      <dc:date>2023-01-11T17:39:57Z</dc:date>
    </item>
  </channel>
</rss>

