cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Getting date out of year and week

Pien
New Contributor II

Hi all,

I'm trying to get a date out of the columns year and week. The week format is not recognized.  

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("week_year"), "uwwyyyy"))

I'm getting this error:

But it doesn't work and I get this error:

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'uwwyyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Any ideas how to get from columns week & year to date in pyspark?

1 ACCEPTED SOLUTION

Accepted Solutions

pvignesh92
Honored Contributor

@Pien Derkx​ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

View solution in original post

5 REPLIES 5

pvignesh92
Honored Contributor

@Pien Derkx​ Hi, It would be really helpful if you can share some sample values that you are trying to typecast along with your expected result so that we can try that. But I have seen that in the new Spark versions, you can just use a single literal for the date formats Ex MM/dd/yyyy --> m/d/y. This will work. Please see if you can try the same like uwy and let me know.

Pien
New Contributor II

Hi @Vigneshraja Palaniraj​ ! Thanks for the help.

Some example values: i'm getting year = 2023 and week = 13, then I'm making that into a string combining them:

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

I add '3' in front because I want to have the wednesday of this week. This results in this string: 3132023

which I use to get the date:

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("3132023"), "uwy"))

desired date I would like to get here is: 2023/03/29 (the wednesday in week 13).

The code gives no errors, but when I try to display the table, the error comes up

pvignesh92
Honored Contributor

@Pien Derkx​ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

Pien
New Contributor II

works, thanks!

Anonymous
Not applicable

Hi @Pien Derkx​ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!