cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Converting the mon-yy format to date, but showing null for output

vicks
New Contributor III

I have a date column that comes with month-year format and I am trying to convert that into dd-mm-yyyy format in pyspark

for example I have date column with value

Jan-2019

Feb-2020

Mar-2020

the output I am expecting is

01/01/2019

01/02/2020

01/03/2020

here is he code I have written but that comes with null values

df2 = input_df.withColumn("mon-yr",to_date(col("mon-yr"),"MM/dd/yyyy"))

df2.show()

any help is appreciated

Thank you,

Vicks

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@vikram sinhha​ : does this answer help?

To convert the date column from the "month-year" format to "dd-mm-yyyy" format in PySpark, you can follow these steps:

  1. First, you need to add a day component to your date values since the to_date() function requires a complete date. You can use the concat() function to append the day to the month-year values.
  2. Then, you can use the to_date() function with the appropriate date format to parse the modified date values.
  3. Finally, you can use the date_format() function to convert the parsed dates into the desired "dd-mm-yyyy" format.

Here's an example of how you can modify your code to achieve the desired output:

from pyspark.sql.functions import concat, lit, to_date, date_format
 
df2 = input_df.withColumn("mon-yr-day", concat(col("mon-yr"), lit("-01")))
df2 = df2.withColumn("date", to_date(col("mon-yr-day"), "MMM-yyyy-dd"))
df2 = df2.withColumn("formatted_date", date_format(col("date"), "dd/MM/yyyy"))
 
df2.show()

In the above code, we first added a day component ("01") to the month-year values using concat() and lit("-01"). Then, we used to_date() to parse the modified dates with the format "MMM-yyyy-dd". Finally, we used date_format() to convert the parsed dates into the "dd/MM/yyyy" format. This should give you the expected output:

+--------+----------+--------------+
| mon-yr |   date   | formatted_date|
+--------+----------+--------------+
|Jan-2019|2019-01-01|    01/01/2019|
|Feb-2020|2020-02-01|    01/02/2020|
|Mar-2020|2020-03-01|    01/03/2020|
+--------+----------+--------------+

View solution in original post

5 REPLIES 5

Lakshay
Esteemed Contributor

Hi @vikram sinhha​ , The issue is happening because the input you are providing is not a valid date format. to_date expects you to provide the correct date format. You need to add the date value to the mon-yr column before passing to to_date function

Vartika
Moderator
Moderator

Hi @vikram sinhha​,

Hope everything is going great.

Does @Lakshay Goel​'s answer help? If it does, would you be happy to mark it as best? If it doesn't, please tell us so we can help you. 

Thanks!

vicks
New Contributor III

Hi Vartika,

Thanks for checking.

Unfortunately It was not yet resolved..

Thanks,

Vicks

Anonymous
Not applicable

@vikram sinhha​ : does this answer help?

To convert the date column from the "month-year" format to "dd-mm-yyyy" format in PySpark, you can follow these steps:

  1. First, you need to add a day component to your date values since the to_date() function requires a complete date. You can use the concat() function to append the day to the month-year values.
  2. Then, you can use the to_date() function with the appropriate date format to parse the modified date values.
  3. Finally, you can use the date_format() function to convert the parsed dates into the desired "dd-mm-yyyy" format.

Here's an example of how you can modify your code to achieve the desired output:

from pyspark.sql.functions import concat, lit, to_date, date_format
 
df2 = input_df.withColumn("mon-yr-day", concat(col("mon-yr"), lit("-01")))
df2 = df2.withColumn("date", to_date(col("mon-yr-day"), "MMM-yyyy-dd"))
df2 = df2.withColumn("formatted_date", date_format(col("date"), "dd/MM/yyyy"))
 
df2.show()

In the above code, we first added a day component ("01") to the month-year values using concat() and lit("-01"). Then, we used to_date() to parse the modified dates with the format "MMM-yyyy-dd". Finally, we used date_format() to convert the parsed dates into the "dd/MM/yyyy" format. This should give you the expected output:

+--------+----------+--------------+
| mon-yr |   date   | formatted_date|
+--------+----------+--------------+
|Jan-2019|2019-01-01|    01/01/2019|
|Feb-2020|2020-02-01|    01/02/2020|
|Mar-2020|2020-03-01|    01/03/2020|
+--------+----------+--------------+

Anonymous
Not applicable

Hi @vikram sinhha​ 

We haven't heard from you since the last response from @Suteja Kanuri​  . Kindly share the information with us, and in return, we will provide you with the necessary solution.

 Thanks and Regards

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group