cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Converting utf-8 is not reflecting panda data frame to workspace for csv file?

SivaPK
New Contributor II

Hi,

I would like to convert the specific column into utf-8 for all the country languages. After converting into and writing into workspace and download it to my local system and opening the excel file still other country character are not displayed properly. What i am missing here?

 

# Encode the "term" column to UTF-8
df_pandas['term'] = df_pandas['term'].apply(lambda x: x.encode('utf-8').decode('utf-8') if isinstance(x, str) else x)
 
%python
# Define the output path in DBFS
output_path = "./Files/output_utf_8.csv"

# Save DataFrame as CSV with UTF-8 encoding
df_pandas.to_csv(output_path, encoding='utf-8', index=False)
#df_pandas.to_csv(output_path)

print(f"File saved at: {output_path}")
 
Thank you.
1 REPLY 1

filipniziol
Esteemed Contributor

Hi @SivaPK ,

It may happen that the CSV is fine, but Excel often does not automatically recognize CSV files as UTF-8 (it might assume ANSI or Windows-1252). This is why characters can appear incorrect.

Instead, open the CSV in a text editor like Notepad++ or VS Code. Then manually set the encoding to UTF-8 to confirm the characters display correctly.

 

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now