06-05-2023 11:52 PM
I have a date column that comes with month-year format and I am trying to convert that into dd-mm-yyyy format in pyspark
for example I have date column with value
Jan-2019
Feb-2020
Mar-2020
the output I am expecting is
01/01/2019
01/02/2020
01/03/2020
here is he code I have written but that comes with null values
df2 = input_df.withColumn("mon-yr",to_date(col("mon-yr"),"MM/dd/yyyy"))
df2.show()
any help is appreciated
Thank you,
Vicks
06-13-2023 07:25 AM
@vikram sinhha : does this answer help?
To convert the date column from the "month-year" format to "dd-mm-yyyy" format in PySpark, you can follow these steps:
Here's an example of how you can modify your code to achieve the desired output:
from pyspark.sql.functions import concat, lit, to_date, date_format
df2 = input_df.withColumn("mon-yr-day", concat(col("mon-yr"), lit("-01")))
df2 = df2.withColumn("date", to_date(col("mon-yr-day"), "MMM-yyyy-dd"))
df2 = df2.withColumn("formatted_date", date_format(col("date"), "dd/MM/yyyy"))
df2.show()
In the above code, we first added a day component ("01") to the month-year values using concat() and lit("-01"). Then, we used to_date() to parse the modified dates with the format "MMM-yyyy-dd". Finally, we used date_format() to convert the parsed dates into the "dd/MM/yyyy" format. This should give you the expected output:
+--------+----------+--------------+
| mon-yr | date | formatted_date|
+--------+----------+--------------+
|Jan-2019|2019-01-01| 01/01/2019|
|Feb-2020|2020-02-01| 01/02/2020|
|Mar-2020|2020-03-01| 01/03/2020|
+--------+----------+--------------+
06-08-2023 07:20 AM
Hi @vikram sinhha , The issue is happening because the input you are providing is not a valid date format. to_date expects you to provide the correct date format. You need to add the date value to the mon-yr column before passing to to_date function
06-09-2023 04:44 AM
Hi @vikram sinhha,
Hope everything is going great.
Does @Lakshay Goel's answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you.
Thanks!
06-09-2023 06:30 AM
Hi Vartika,
Thanks for checking.
Unfortunately It was not yet resolved..
Thanks,
Vicks
06-13-2023 07:25 AM
@vikram sinhha : does this answer help?
To convert the date column from the "month-year" format to "dd-mm-yyyy" format in PySpark, you can follow these steps:
Here's an example of how you can modify your code to achieve the desired output:
from pyspark.sql.functions import concat, lit, to_date, date_format
df2 = input_df.withColumn("mon-yr-day", concat(col("mon-yr"), lit("-01")))
df2 = df2.withColumn("date", to_date(col("mon-yr-day"), "MMM-yyyy-dd"))
df2 = df2.withColumn("formatted_date", date_format(col("date"), "dd/MM/yyyy"))
df2.show()
In the above code, we first added a day component ("01") to the month-year values using concat() and lit("-01"). Then, we used to_date() to parse the modified dates with the format "MMM-yyyy-dd". Finally, we used date_format() to convert the parsed dates into the "dd/MM/yyyy" format. This should give you the expected output:
+--------+----------+--------------+
| mon-yr | date | formatted_date|
+--------+----------+--------------+
|Jan-2019|2019-01-01| 01/01/2019|
|Feb-2020|2020-02-01| 01/02/2020|
|Mar-2020|2020-03-01| 01/03/2020|
+--------+----------+--------------+
06-14-2023 12:04 AM
Hi @vikram sinhha
We haven't heard from you since the last response from @Suteja Kanuri . Kindly share the information with us, and in return, we will provide you with the necessary solution.
Thanks and Regards
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group