Re: Data shifted when a pyspark dataframe column o...

fabien_arnaud · ‎10-21-2024

Yes the dataframe reads from a CSV. Here is the code:

df_input = (spark

.read

.format('CSV')

.options(header= True,

delimiter = ",",

quote = '"',

escape = '"',

inferSchema = 'false',

encoding = 'UTF8',

multiline = True,

rootTag = '',

rowTag = '',

attributePrefix = ''

)

.load("dbfs:/mnt/bdwuploaddevfabien-mdm/mdm_vendor_master_2024-09-10.csv")

)

Here is the screenshot of a subsequent filtered dataframe as suggested. The problem persists:

By the way, I tested the code with runtimes 13.3LTS, 14.3LTS and 15.4LTS as well, and the issue occurs with all except 15.4LTS.