Pyspark to_date not coping with single digit Day or Month

RobDineen
Contributor

Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 so

RobDineen_0-1731324661487.png

is there a nice easy way to get round this at all

Regards

Rob

VZLA
Databricks Employee
Databricks Employee

Hi @RobDineen,

You may try setting the timeParserPolicy to meet your use case needs.

When LEGACY, java.text.SimpleDateFormat is used for formatting and parsing dates/timestamps in a locale-sensitive manner, which is the approach before Spark 3.0. 

When set to CORRECTED, classes from java.time.* packages are used for the same purpose. The default value is EXCEPTION, RuntimeException is thrown when we will get different results.

spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY") 

or

spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")

i have been trying to solve it with the following New column on the fly,  

if DayofMonth in (1,2,3,4,5,6,7,8,9) then put a 0 before, else leave as is.

RobDineen_0-1731332791231.png


obviously I'm trying to insert the 0 incorrectly. but wondering how?

 

 

nearly there

 

RobDineen_1-1731333144746.png

 

Hi @VZLA 

any idea with the below work around, I'm nearly there.

RobDineen
Contributor

Resolved using format_string

 

dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df.DayofMonth)).otherwise(df.DayofMonth))

View solution in original post