01-28-2023 04:51 AM
Hi all,
I've a dataframe with CreateDate column with this format:
CreateDate
/Date(1593786688000+0200)/
/Date(1446032157000+0100)/
/Date(1533904635000+0200)/
/Date(1447839805000+0100)/
/Date(1589451249000+0200)/
and I want to convert that format to date/timestamp, so the excepted output will be:
CreateDate
2020-07-03 14:31:28 +02:00
2015-10-28 11:35:57 +01:00
2018-08-10 12:37:15 +02:00
2015-11-18 09:43:25 +01:00
2020-05-14 10:14:09 +02:00
I have this query in SQL that gives the desired output and that can help to develop:
cast(convert(VARCHAR(30), DATEADD(Second, convert(BIGINT, left(replace(replace(CreateDate, '/date(', ''), ')/', ''), 13)) / 1000, '1970-01-01 00:00:00'), 20) + ' ' + '+' + left(right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 4), 2) + ':' + right(replace(replace(CreateDate, '/date(', ''), ')/', ''), 2) AS DATETIMEOFFSET(0)) AS CreateDate
Can anyone please help me in achieving this?
Thank you!
01-28-2023 08:34 PM
Hi @Bruno Franco ,
Can you please try the below code, hope it might for you.
from pyspark.sql.functions import from_unixtime
from pyspark.sql import functions as F
final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("CreateDate"), "(\d+)", 1)/1000),"yyyy-MM-dd HH:mm:ss"))\
.withColumn("Offset_Time", F.regexp_extract(split(df_source['CreateDate'],'\\+')[1], "(\d+)", 1))
display(final_df)
I have divided the value with 1000 because from_unixtime takes arguments in seconds, and your
timestamp is in milliseconds.
Happy Learning!!
01-28-2023 08:34 PM
Hi @Bruno Franco ,
Can you please try the below code, hope it might for you.
from pyspark.sql.functions import from_unixtime
from pyspark.sql import functions as F
final_df = df_src.withColumn("Final_Timestamp", from_unixtime((F.regexp_extract(col("CreateDate"), "(\d+)", 1)/1000),"yyyy-MM-dd HH:mm:ss"))\
.withColumn("Offset_Time", F.regexp_extract(split(df_source['CreateDate'],'\\+')[1], "(\d+)", 1))
display(final_df)
I have divided the value with 1000 because from_unixtime takes arguments in seconds, and your
timestamp is in milliseconds.
Happy Learning!!
01-29-2023 03:43 AM
Thanks a lot @Ratna Chaitanya Raju Bandaru , I picked up your code and I did this:
final_df = df.withColumn("CreateDateNew", concat(from_unixtime((F.regexp_extract(col("CreateDate"), "(\d+)", 1)/1000),"yyyy-MM-dd HH:mm:ss"), lit(" "), regexp_replace(regexp_extract("CreateDate", re, 2), "(\\d{2})(\\d{2})" , "$1:$2" )))
and I got the desired output:
Once again, thanks and kind regards 👍
01-29-2023 04:00 AM
Hi @Bruno Franco ,
Glad it helped you
Happy Learning!!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group