- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-05-2023 11:52 PM
I have a date column that comes with month-year format and I am trying to convert that into dd-mm-yyyy format in pyspark
for example I have date column with value
Jan-2019
Feb-2020
Mar-2020
the output I am expecting is
01/01/2019
01/02/2020
01/03/2020
here is he code I have written but that comes with null values
df2 = input_df.withColumn("mon-yr",to_date(col("mon-yr"),"MM/dd/yyyy"))
df2.show()
any help is appreciated
Thank you,
Vicks
- Labels:
-
Date
-
Date Column
-
Null
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-13-2023 07:25 AM
@vikram sinhha : does this answer help?
To convert the date column from the "month-year" format to "dd-mm-yyyy" format in PySpark, you can follow these steps:
- First, you need to add a day component to your date values since the to_date() function requires a complete date. You can use the concat() function to append the day to the month-year values.
- Then, you can use the to_date() function with the appropriate date format to parse the modified date values.
- Finally, you can use the date_format() function to convert the parsed dates into the desired "dd-mm-yyyy" format.
Here's an example of how you can modify your code to achieve the desired output:
from pyspark.sql.functions import concat, lit, to_date, date_format
df2 = input_df.withColumn("mon-yr-day", concat(col("mon-yr"), lit("-01")))
df2 = df2.withColumn("date", to_date(col("mon-yr-day"), "MMM-yyyy-dd"))
df2 = df2.withColumn("formatted_date", date_format(col("date"), "dd/MM/yyyy"))
df2.show()
In the above code, we first added a day component ("01") to the month-year values using concat() and lit("-01"). Then, we used to_date() to parse the modified dates with the format "MMM-yyyy-dd". Finally, we used date_format() to convert the parsed dates into the "dd/MM/yyyy" format. This should give you the expected output:
+--------+----------+--------------+
| mon-yr | date | formatted_date|
+--------+----------+--------------+
|Jan-2019|2019-01-01| 01/01/2019|
|Feb-2020|2020-02-01| 01/02/2020|
|Mar-2020|2020-03-01| 01/03/2020|
+--------+----------+--------------+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-08-2023 07:20 AM
Hi @vikram sinhha , The issue is happening because the input you are providing is not a valid date format. to_date expects you to provide the correct date format. You need to add the date value to the mon-yr column before passing to to_date function
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2023 04:44 AM
Hi @vikram sinhha,
Hope everything is going great.
Does @Lakshay Goel's answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you.
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-09-2023 06:30 AM
Hi Vartika,
Thanks for checking.
Unfortunately It was not yet resolved..
Thanks,
Vicks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-13-2023 07:25 AM
@vikram sinhha : does this answer help?
To convert the date column from the "month-year" format to "dd-mm-yyyy" format in PySpark, you can follow these steps:
- First, you need to add a day component to your date values since the to_date() function requires a complete date. You can use the concat() function to append the day to the month-year values.
- Then, you can use the to_date() function with the appropriate date format to parse the modified date values.
- Finally, you can use the date_format() function to convert the parsed dates into the desired "dd-mm-yyyy" format.
Here's an example of how you can modify your code to achieve the desired output:
from pyspark.sql.functions import concat, lit, to_date, date_format
df2 = input_df.withColumn("mon-yr-day", concat(col("mon-yr"), lit("-01")))
df2 = df2.withColumn("date", to_date(col("mon-yr-day"), "MMM-yyyy-dd"))
df2 = df2.withColumn("formatted_date", date_format(col("date"), "dd/MM/yyyy"))
df2.show()
In the above code, we first added a day component ("01") to the month-year values using concat() and lit("-01"). Then, we used to_date() to parse the modified dates with the format "MMM-yyyy-dd". Finally, we used date_format() to convert the parsed dates into the "dd/MM/yyyy" format. This should give you the expected output:
+--------+----------+--------------+
| mon-yr | date | formatted_date|
+--------+----------+--------------+
|Jan-2019|2019-01-01| 01/01/2019|
|Feb-2020|2020-02-01| 01/02/2020|
|Mar-2020|2020-03-01| 01/03/2020|
+--------+----------+--------------+
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-14-2023 12:04 AM
Hi @vikram sinhha
We haven't heard from you since the last response from @Suteja Kanuri . Kindly share the information with us, and in return, we will provide you with the necessary solution.
Thanks and Regards

