<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/lakeflow-pipelines-trying-to-read-accented-file-with-spark/m-p/135482#M50360</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/192982"&gt;@AmarKap&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;When Spark decodes CP1252 bytes as UTF-8/ISO-8859-1, you’ll see the replacement char like&amp;nbsp;&lt;SPAN&gt;�&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Can you read the file as :&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;EM&gt;df = (spark.readStream&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.format("cloudFiles")&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.option("cloudFiles.format", "text")&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.option("encoding", "windows-1252") # or "CP1252"&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.load("s3://.../path"))&amp;nbsp;&lt;/EM&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 21 Oct 2025 07:38:13 GMT</pubDate>
    <dc:creator>K_Anudeep</dc:creator>
    <dc:date>2025-10-21T07:38:13Z</dc:date>
    <item>
      <title>Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-pipelines-trying-to-read-accented-file-with-spark/m-p/135440#M50351</link>
      <description>&lt;P&gt;Trying to read a accented file(French characters) but the spark.readStream function is not working and special characters turn into something strange(ex.&amp;nbsp;&lt;SPAN&gt;�)&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; spark.readStream&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudfiles"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"cloudFiles.format"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"text"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; .&lt;/SPAN&gt;&lt;SPAN&gt;option&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"encoding"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"ISO-8859-1"&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;BR /&gt;Tried both&amp;nbsp;&lt;SPAN&gt;ISO-8859-1 and UTF-8.&amp;nbsp;&lt;BR /&gt;Tried with and without&amp;nbsp; .option("cloudFiles.format", "text")&lt;BR /&gt;Files do not contains .txt extension&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 20 Oct 2025 18:05:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-pipelines-trying-to-read-accented-file-with-spark/m-p/135440#M50351</guid>
      <dc:creator>AmarKap</dc:creator>
      <dc:date>2025-10-20T18:05:21Z</dc:date>
    </item>
    <item>
      <title>Re: Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure</title>
      <link>https://community.databricks.com/t5/data-engineering/lakeflow-pipelines-trying-to-read-accented-file-with-spark/m-p/135482#M50360</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/192982"&gt;@AmarKap&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;When Spark decodes CP1252 bytes as UTF-8/ISO-8859-1, you’ll see the replacement char like&amp;nbsp;&lt;SPAN&gt;�&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;Can you read the file as :&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;&lt;EM&gt;df = (spark.readStream&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.format("cloudFiles")&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.option("cloudFiles.format", "text")&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.option("encoding", "windows-1252") # or "CP1252"&lt;/EM&gt;&lt;BR /&gt;&lt;EM&gt;.load("s3://.../path"))&amp;nbsp;&lt;/EM&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 21 Oct 2025 07:38:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/lakeflow-pipelines-trying-to-read-accented-file-with-spark/m-p/135482#M50360</guid>
      <dc:creator>K_Anudeep</dc:creator>
      <dc:date>2025-10-21T07:38:13Z</dc:date>
    </item>
  </channel>
</rss>

