Data shifted when a pyspark dataframe column only contains a comma
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-21-2024 01:58 AM
I have a dataframe containing several columns among which 1 contains, for one specific record, just a comma, nothing else.
When displaying the dataframe with the command
display(df_input.where(col("erp_vendor_cd") == 'B6SA-VEN0008838'))
The data is displayed correctly for all of my columns
However, when I select specific columns from the same dataframe, i.e.
display(df_input.where(col("erp_vendor_cd") == 'B6SA-VEN0008838').select(col("postal_cd"),col("state_cd"), col("state_nm"),col("country_cd"), col("country_nm")))
all of my data from columns to the right of the one that only contains the comma gets shifted to the left. The comma seems to be identified as a column separator during the "select" although everything is correctly loaded in my dataframe.
How can I avoid this behavior?
I use databricks runtime 12.2LTS and my notebook is in python.