cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I add a duration in milliseconds to a timestamp?

Merchiv
New Contributor III

Let's say I have a DataFrame with a timestamp and an offset column in milliseconds respectively in the timestamp and long format.

E.g.

from datetime import datetime
df = spark.createDataFrame(
    [
        (datetime(2021, 1, 1), 1500, ),
        (datetime(2021, 1, 2), 1200, )
    ],
    ["timestamp", "offsetmillis", ],
)

Now I want to add these offsets to the datetime, so that I get:

2021-01-01T00:00:01.500 and 2021-01-0T00:00:01.200

If I add these directly I get an error about type mismatch, which does make sense:

[DATATYPE_MISMATCH.BINARY_OP_DIFF_TYPES] Cannot resolve "(timestamp + offsetmillis)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("TIMESTAMP" and "BIGINT")

However I'm not sure how I can best cast this to a duration or interval.

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Ivo Merchiers​ ,

Here is how I did it. As you mentioned, I am considering a date with milliseconds as input in "ts" column and offset to be added in "offSetMillis" column. First of all, I converted the "ts" column to milliseconds and then added "offSetMillis" to it and finally converted this new value back to timestamp in "new_ts" column

Screenshot 2023-02-06 at 6.50.51 PM

View solution in original post

4 REPLIES 4

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Ivo Merchiers​ , If you are just trying to create a date with milliseconds, you can create them directly by providing the value in datetime as below.

Screenshot 2023-02-04 at 12.28.02 AMHowever, if your usecase is to add milliseconds to the date value then you have to convert the date to milliseconds before adding milliseconds to it.

Merchiv
New Contributor III

Hi @Lakshay Goel​,

I've just added the `spark.createDataFrame` command here as an example, the real data is coming from some existing tables, so I can't do it in the python initialisation.

I want to do the addition of some milliseconds (in integer/long/whatever) format to a timestamp (which should already have milliseconds precision) in Pyspark.

How would I go about doing the second approach you proposed?

Lakshay
Esteemed Contributor
Esteemed Contributor

Hi @Ivo Merchiers​ ,

Here is how I did it. As you mentioned, I am considering a date with milliseconds as input in "ts" column and offset to be added in "offSetMillis" column. First of all, I converted the "ts" column to milliseconds and then added "offSetMillis" to it and finally converted this new value back to timestamp in "new_ts" column

Screenshot 2023-02-06 at 6.50.51 PM

Merchiv
New Contributor III

Although @Lakshay Goel​'s solution works, we've been using an alternative approach, that we found to be a bit more readable:

from pyspark.sql import Column, functions as f
 
 
def make_dt_interval_sec(col: Column):
    return f.expr(f"make_dt_interval(0,0,0,{col._jc.toString()})")
 
df.withColumn(
      start_col,
        f.col("timestamp") - make_dt_interval_sec(f.col("offsetmillis") / 1000),
     )

I'm not sure if there is any performance difference between both methods.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!