cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Pyspark - How do I convert date/timestamp of format like /Date(1593786688000+0200)/ in pyspark?

BF
New Contributor II

Hi all,

I've a dataframe with CreateDate column with this format:

CreateDate

/Date(1593786688000+0200)/

/Date(1446032157000+0100)/

/Date(1533904635000+0200)/

/Date(1447839805000+0100)/

/Date(1589451249000+0200)/

and I want to convert that format to date/timestamp, so the excepted output will be:

CreateDate

2020-07-03 14:31:28 +02:00

2015-10-28 11:35:57 +01:00

2018-08-10 12:37:15 +02:00

2015-11-18 09:43:25 +01:00

2020-05-14 10:14:09 +02:00

I have this query in SQL that gives the desired output and that can help to develop:

cast(convert(VARCHAR(30), DATEADD(Second, convert(BIGINT, left(replace(replace(CreateDate, '/date(', ''), ')/', ''), 13)) / 1000, '1970-01-01 00:00:00'), 20) + ' ' + '+' + left(right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 4), 2) + ':' + right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 2) AS DATETIMEOFFSET(0)) AS CreateDate

Can anyone please help me in achieving this?

Thank you!

1 ACCEPTED SOLUTION

Accepted Solutions

Chaitanya_Raju
Honored Contributor

Hi @Bruno Franco​ ,

Can you please try the below code, hope it might for you.

from pyspark.sql.functions import from_unixtime
from pyspark.sql import functions as F
final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("CreateDate"), "(\d+)", 1)/1000),"yyyy-MM-dd HH:mm:ss"))\
                 .withColumn("Offset_Time", F.regexp_extract(split(df_source['CreateDate'],'\\+')[1], "(\d+)", 1))
display(final_df)

1 

I have divided the value with 1000 because from_unixtime  takes arguments in seconds, and your

timestamp is in milliseconds.

Happy Learning!!

Thanks for reading and like if this is useful and for improvements or feedback please comment.

View solution in original post

3 REPLIES 3

Chaitanya_Raju
Honored Contributor

Hi @Bruno Franco​ ,

Can you please try the below code, hope it might for you.

from pyspark.sql.functions import from_unixtime
from pyspark.sql import functions as F
final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("CreateDate"), "(\d+)", 1)/1000),"yyyy-MM-dd HH:mm:ss"))\
                 .withColumn("Offset_Time", F.regexp_extract(split(df_source['CreateDate'],'\\+')[1], "(\d+)", 1))
display(final_df)

1 

I have divided the value with 1000 because from_unixtime  takes arguments in seconds, and your

timestamp is in milliseconds.

Happy Learning!!

Thanks for reading and like if this is useful and for improvements or feedback please comment.

BF
New Contributor II

Thanks a lot @Ratna Chaitanya Raju Bandaru​ , I picked up your code and I did this:

final_df = df.withColumn("CreateDateNew", concat(from_unixtime((F.regexp_extract(col("CreateDate"), "(\d+)", 1)/1000),"yyyy-MM-dd HH:mm:ss"), lit(" "), regexp_replace(regexp_extract("CreateDate", re, 2), "(\\d{2})(\\d{2})" , "$1:$2" )))

and I got the desired output:

image.png 

Once again, thanks and kind regards 👍

Chaitanya_Raju
Honored Contributor

Hi @Bruno Franco​ ,

Glad it helped you

Happy Learning!!

Thanks for reading and like if this is useful and for improvements or feedback please comment.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group