cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Getting date out of year and week

Pien
New Contributor II

Hi all,

I'm trying to get a date out of the columns year and week. The week format is not recognized.  

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("week_year"), "uwwyyyy"))

I'm getting this error:

But it doesn't work and I get this error:

SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'uwwyyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Any ideas how to get from columns week & year to date in pyspark?

1 ACCEPTED SOLUTION

Accepted Solutions

pvignesh92
Honored Contributor

@Pien Derkxโ€‹ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

View solution in original post

5 REPLIES 5

pvignesh92
Honored Contributor

@Pien Derkxโ€‹ Hi, It would be really helpful if you can share some sample values that you are trying to typecast along with your expected result so that we can try that. But I have seen that in the new Spark versions, you can just use a single literal for the date formats Ex MM/dd/yyyy --> m/d/y. This will work. Please see if you can try the same like uwy and let me know.

Pien
New Contributor II

Hi @Vigneshraja Palanirajโ€‹ ! Thanks for the help.

Some example values: i'm getting year = 2023 and week = 13, then I'm making that into a string combining them:

df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))

I add '3' in front because I want to have the wednesday of this week. This results in this string: 3132023

which I use to get the date:

df_loaded = df_loaded.withColumn("date", F.to_date(F.col("3132023"), "uwy"))

desired date I would like to get here is: 2023/03/29 (the wednesday in week 13).

The code gives no errors, but when I try to display the table, the error comes up

pvignesh92
Honored Contributor

@Pien Derkxโ€‹ You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.

spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
 
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()

dateinput -> 3312023

Output came for dateexected -> 2023-03-29

Please try and let me know if this works.

Pien
New Contributor II

works, thanks!

Anonymous
Not applicable

Hi @Pien Derkxโ€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group