<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Converting utf-8 is not reflecting  panda data frame to workspace for csv file? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/converting-utf-8-is-not-reflecting-panda-data-frame-to-workspace/m-p/109937#M43438</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I would like to convert the specific column into utf-8 for all the country languages. After converting into and writing into workspace and download it to my local system and opening the excel file still other country character are not displayed properly. What i am missing here?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Encode the "term" column to UTF-8&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_pandas[&lt;/SPAN&gt;&lt;SPAN&gt;'term'&lt;/SPAN&gt;&lt;SPAN&gt;] &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df_pandas[&lt;/SPAN&gt;&lt;SPAN&gt;'term'&lt;/SPAN&gt;&lt;SPAN&gt;].&lt;/SPAN&gt;&lt;SPAN&gt;apply&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;lambda&lt;/SPAN&gt; &lt;SPAN&gt;x&lt;/SPAN&gt;&lt;SPAN&gt;: x.&lt;/SPAN&gt;&lt;SPAN&gt;encode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'utf-8'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;decode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'utf-8'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;isinstance&lt;/SPAN&gt;&lt;SPAN&gt;(x, &lt;/SPAN&gt;&lt;SPAN&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;else&lt;/SPAN&gt;&lt;SPAN&gt; x)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;%python&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Define the output path in DBFS&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;output_path = "./Files/output_utf_8.csv"&lt;/STRONG&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Save DataFrame as CSV with UTF-8 encoding&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df_pandas.to_csv(output_path, encoding='utf-8', index=False)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;#df_pandas.to_csv(output_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"File saved at: &lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;output_path&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 12 Feb 2025 06:58:53 GMT</pubDate>
    <dc:creator>SivaPK</dc:creator>
    <dc:date>2025-02-12T06:58:53Z</dc:date>
    <item>
      <title>Converting utf-8 is not reflecting  panda data frame to workspace for csv file?</title>
      <link>https://community.databricks.com/t5/data-engineering/converting-utf-8-is-not-reflecting-panda-data-frame-to-workspace/m-p/109937#M43438</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;I would like to convert the specific column into utf-8 for all the country languages. After converting into and writing into workspace and download it to my local system and opening the excel file still other country character are not displayed properly. What i am missing here?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Encode the "term" column to UTF-8&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df_pandas[&lt;/SPAN&gt;&lt;SPAN&gt;'term'&lt;/SPAN&gt;&lt;SPAN&gt;] &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df_pandas[&lt;/SPAN&gt;&lt;SPAN&gt;'term'&lt;/SPAN&gt;&lt;SPAN&gt;].&lt;/SPAN&gt;&lt;SPAN&gt;apply&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;lambda&lt;/SPAN&gt; &lt;SPAN&gt;x&lt;/SPAN&gt;&lt;SPAN&gt;: x.&lt;/SPAN&gt;&lt;SPAN&gt;encode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'utf-8'&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;decode&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;'utf-8'&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;if&lt;/SPAN&gt; &lt;SPAN&gt;isinstance&lt;/SPAN&gt;&lt;SPAN&gt;(x, &lt;/SPAN&gt;&lt;SPAN&gt;str&lt;/SPAN&gt;&lt;SPAN&gt;) &lt;/SPAN&gt;&lt;SPAN&gt;else&lt;/SPAN&gt;&lt;SPAN&gt; x)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;%python&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# Define the output path in DBFS&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;output_path = "./Files/output_utf_8.csv"&lt;/STRONG&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Save DataFrame as CSV with UTF-8 encoding&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;STRONG&gt;df_pandas.to_csv(output_path, encoding='utf-8', index=False)&lt;/STRONG&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;#df_pandas.to_csv(output_path)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;print&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;f&lt;/SPAN&gt;&lt;SPAN&gt;"File saved at: &lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;output_path&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;SPAN&gt;"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Thank you.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 12 Feb 2025 06:58:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/converting-utf-8-is-not-reflecting-panda-data-frame-to-workspace/m-p/109937#M43438</guid>
      <dc:creator>SivaPK</dc:creator>
      <dc:date>2025-02-12T06:58:53Z</dc:date>
    </item>
    <item>
      <title>Re: Converting utf-8 is not reflecting  panda data frame to workspace for csv file?</title>
      <link>https://community.databricks.com/t5/data-engineering/converting-utf-8-is-not-reflecting-panda-data-frame-to-workspace/m-p/109939#M43440</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/73754"&gt;@SivaPK&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;
&lt;P&gt;It may happen that the CSV is fine, but Excel often does not automatically recognize CSV files as UTF-8 (it might assume ANSI or Windows-1252). This is why characters can appear incorrect.&lt;/P&gt;
&lt;P&gt;Instead, open the CSV in a text editor like Notepad++ or VS Code. Then manually set the encoding to UTF-8 to confirm the characters display correctly.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 12 Feb 2025 08:16:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/converting-utf-8-is-not-reflecting-panda-data-frame-to-workspace/m-p/109939#M43440</guid>
      <dc:creator>filipniziol</dc:creator>
      <dc:date>2025-02-12T08:16:30Z</dc:date>
    </item>
  </channel>
</rss>

