cancel
Showing results for 
Search instead for 
Did you mean: 

Getting date out of year and week

Pien
New Contributor II

Hi all,

I'm trying to get a date out of the columns year and week. The week format is not recognized.  

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("week_year"), "uwwyyyy"))

I'm getting this error:

But it doesn't work and I get this error:

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'uwwyyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Any ideas how to get from columns week & year to date in pyspark?

1 ACCEPTED SOLUTION

Accepted Solutions

pvignesh92
Honored Contributor

@Pien Derkx​ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

View solution in original post

5 REPLIES 5

pvignesh92
Honored Contributor

@Pien Derkx​ Hi, It would be really helpful if you can share some sample values that you are trying to typecast along with your expected result so that we can try that. But I have seen that in the new Spark versions, you can just use a single literal for the date formats Ex MM/dd/yyyy --> m/d/y. This will work. Please see if you can try the same like uwy and let me know.

Pien
New Contributor II

Hi @Vigneshraja Palaniraj​ ! Thanks for the help.

Some example values: i'm getting year = 2023 and week = 13, then I'm making that into a string combining them:

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

I add '3' in front because I want to have the wednesday of this week. This results in this string: 3132023

which I use to get the date:

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("3132023"), "uwy"))

desired date I would like to get here is: 2023/03/29 (the wednesday in week 13).

The code gives no errors, but when I try to display the table, the error comes up

pvignesh92
Honored Contributor

@Pien Derkx​ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

Pien
New Contributor II

works, thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.