04-12-2023 12:16 AM
Hi all,
I'm trying to get a date out of the columns year and week. The week format is not recognized.
df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))
df_loaded = df_loaded.withColumn("date", F.to_date(F.col("week_year"), "uwwyyyy"))
I'm getting this error:
But it doesn't work and I get this error:
SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'uwwyyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Any ideas how to get from columns week & year to date in pyspark?
04-12-2023 02:50 AM
@Pien Derkx You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()
dateinput -> 3312023
Output came for dateexected -> 2023-03-29
Please try and let me know if this works.
04-12-2023 12:59 AM
@Pien Derkx Hi, It would be really helpful if you can share some sample values that you are trying to typecast along with your expected result so that we can try that. But I have seen that in the new Spark versions, you can just use a single literal for the date formats Ex MM/dd/yyyy --> m/d/y. This will work. Please see if you can try the same like uwy and let me know.
04-12-2023 01:16 AM
Hi @Vigneshraja Palaniraj ! Thanks for the help.
Some example values: i'm getting year = 2023 and week = 13, then I'm making that into a string combining them:
df_loaded = df_loaded.withColumn("week_year", F.concat(F.lit("3"),F.col('Week'), F.col('Jaar')))
I add '3' in front because I want to have the wednesday of this week. This results in this string: 3132023
which I use to get the date:
df_loaded = df_loaded.withColumn("date", F.to_date(F.col("3132023"), "uwy"))
desired date I would like to get here is: 2023/03/29 (the wednesday in week 13).
The code gives no errors, but when I try to display the table, the error comes up
04-12-2023 02:50 AM
@Pien Derkx You can set the spark sql formatter to LEGACY and try your steps. Below is what worked for me.
spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")
df.withColumn("date_expected", to_date(col("dateinput", "uwwyyyy")).show()
dateinput -> 3312023
Output came for dateexected -> 2023-03-29
Please try and let me know if this works.
04-17-2023 04:13 AM
works, thanks!
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.