04-12-2023 12:16 AM
Hi all,
I'm trying to get a date out of the columns year and week. The week format is not recognized.
df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))
df_loaded = df_loaded.withColumn("date", F.to_date(F.col("week_year"), "uwwyyyy"))
I'm getting this error:
But it doesn't work and I get this error:
SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'uwwyyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Any ideas how to get from columns week & year to date in pyspark?
04-12-2023 02:50 AM
@Pien Derkx You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()
dateinput -> 3312023
Output came for dateexected -> 2023-03-29
Please try and let me know if this works.
04-12-2023 12:59 AM
@Pien Derkx Hi, It would be really helpful if you can share some sample values that you are trying to typecast along with your expected result so that we can try that. But I have seen that in the new Spark versions, you can just use a single literal for the date formats Ex MM/dd/yyyy --> m/d/y. This will work. Please see if you can try the same like uwy and let me know.
04-12-2023 01:16 AM
Hi @Vigneshraja Palaniraj ! Thanks for the help.
Some example values: i'm getting year = 2023 and week = 13, then I'm making that into a string combining them:
df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))
I add '3' in front because I want to have the wednesday of this week. This results in this string: 3132023
which I use to get the date:
df_loaded = df_loaded.withColumn("date", F.to_date(F.col("3132023"), "uwy"))
desired date I would like to get here is: 2023/03/29 (the wednesday in week 13).
The code gives no errors, but when I try to display the table, the error comes up
04-12-2023 02:50 AM
@Pien Derkx You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()
dateinput -> 3312023
Output came for dateexected -> 2023-03-29
Please try and let me know if this works.
04-17-2023 04:13 AM
works, thanks!
04-13-2023 01:57 AM
Hi @Pien Derkx
Thank you for posting your question in our community! We are happy to assist you.
To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?
This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group