cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I add a duration in milliseconds to a timestamp?

Merchiv
New Contributor III

Let's say I have a DataFrame with a timestamp and an offset column in milliseconds respectively in the timestamp and long format.

E.g.

from datetime import datetime
df = spark.createDataFrame(
    [
        (datetime(2021, 1, 1), 1500, ),
        (datetime(2021, 1, 2), 1200, )
    ],
    ["timestamp", "offsetmillis", ],
)

Now I want to add these offsets to the datetime, so that I get:

2021-01-01T00:00:01.500 and 2021-01-0T00:00:01.200

If I add these directly I get an error about type mismatch, which does make sense:

[DATATYPE_MISMATCH.BINARY_OP_DIFF_TYPES] Cannot resolve "(timestamp + offsetmillis)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("TIMESTAMP" and "BIGINT")

However I'm not sure how I can best cast this to a duration or interval.

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Ivo Merchiers​ ,

Here is how I did it. As you mentioned, I am considering a date with milliseconds as input in "ts" column and offset to be added in "offSetMillis" column. First of all, I converted the "ts" column to milliseconds and then added "offSetMillis" to it and finally converted this new value back to timestamp in "new_ts" column

Screenshot 2023-02-06 at 6.50.51 PM

View solution in original post

4 REPLIES 4

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Ivo Merchiers​ , If you are just trying to create a date with milliseconds, you can create them directly by providing the value in datetime as below.

Screenshot 2023-02-04 at 12.28.02 AMHowever, if your usecase is to add milliseconds to the date value then you have to convert the date to milliseconds before adding milliseconds to it.

Merchiv
New Contributor III

Hi @Lakshay Goel​,

I've just added the `spark.createDataFrame` command here as an example, the real data is coming from some existing tables, so I can't do it in the python initialisation.

I want to do the addition of some milliseconds (in integer/long/whatever) format to a timestamp (which should already have milliseconds precision) in Pyspark.

How would I go about doing the second approach you proposed?

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Ivo Merchiers​ ,

Here is how I did it. As you mentioned, I am considering a date with milliseconds as input in "ts" column and offset to be added in "offSetMillis" column. First of all, I converted the "ts" column to milliseconds and then added "offSetMillis" to it and finally converted this new value back to timestamp in "new_ts" column

Screenshot 2023-02-06 at 6.50.51 PM

Merchiv
New Contributor III

Although @Lakshay Goel​'s solution works, we've been using an alternative approach, that we found to be a bit more readable:

from pyspark.sql import Column, functions as f
 
 
def make_dt_interval_sec(col: Column):
    return f.expr(f"make_dt_interval(0,0,0,{col._jc.toString()})")
 
df.withColumn(
      start_col,
        f.col("timestamp") - make_dt_interval_sec(f.col("offsetmillis") / 1000),
     )

I'm not sure if there is any performance difference between both methods.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.