Re: Pyspark to_date not coping with single digit D...

RobDineen · ‎11-11-2024

Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 so

is there a nice easy way to get round this at all

Regards

Rob

VZLA · ‎11-11-2024

You may try setting the timeParserPolicy to meet your use case needs.

When LEGACY, java.text.SimpleDateFormat is used for formatting and parsing dates/timestamps in a locale-sensitive manner, which is the approach before Spark 3.0.

When set to CORRECTED, classes from java.time.* packages are used for the same purpose. The default value is EXCEPTION, RuntimeException is thrown when we will get different results.

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")

or

spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

RobDineen · ‎11-11-2024

i have been trying to solve it with the following New column on the fly,

if DayofMonth in (1,2,3,4,5,6,7,8,9) then put a 0 before, else leave as is.

obviously I'm trying to insert the 0 incorrectly. but wondering how?

nearly there

RobDineen · ‎11-12-2024

Hi @VZLA

any idea with the below work around, I'm nearly there.

RobDineen · ‎11-13-2024

Resolved using format_string

dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df.DayofMonth)).otherwise(df.DayofMonth))

View solution in original post

Pyspark to_date not coping with single digit Day or Month