cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting date out of year and week

Pien
New Contributor II

Hi all,

I'm trying to get a date out of the columns year and week. The week format is not recognized.  

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("week_year"), "uwwyyyy"))

I'm getting this error:

But it doesn't work and I get this error:

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'uwwyyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Any ideas how to get from columns week & year to date in pyspark?

1 ACCEPTED SOLUTION

Accepted Solutions

pvignesh92
Honored Contributor

@Pien Derkx​ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

View solution in original post

5 REPLIES 5

pvignesh92
Honored Contributor

@Pien Derkx​ Hi, It would be really helpful if you can share some sample values that you are trying to typecast along with your expected result so that we can try that. But I have seen that in the new Spark versions, you can just use a single literal for the date formats Ex MM/dd/yyyy --> m/d/y. This will work. Please see if you can try the same like uwy and let me know.

Pien
New Contributor II

Hi @Vigneshraja Palaniraj​ ! Thanks for the help.

Some example values: i'm getting year = 2023 and week = 13, then I'm making that into a string combining them:

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

I add '3' in front because I want to have the wednesday of this week. This results in this string: 3132023

which I use to get the date:

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("3132023"), "uwy"))

desired date I would like to get here is: 2023/03/29 (the wednesday in week 13).

The code gives no errors, but when I try to display the table, the error comes up

pvignesh92
Honored Contributor

@Pien Derkx​ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

Pien
New Contributor II

works, thanks!

Anonymous
Not applicable

Hi @Pien Derkx​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.