cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I add a duration in milliseconds to a timestamp?

Merchiv
New Contributor III

Let's say I have a DataFrame with a timestamp and an offset column in milliseconds respectively in the timestamp and long format.

E.g.

from datetime import datetime
df = spark.createDataFrame(
    [
        (datetime(2021, 1, 1), 1500, ),
        (datetime(2021, 1, 2), 1200, )
    ],
    ["timestamp", "offsetmillis", ],
)

Now I want to add these offsets to the datetime, so that I get:

2021-01-01T00:00:01.500 and 2021-01-0T00:00:01.200

If I add these directly I get an error about type mismatch, which does make sense:

[DATATYPE_MISMATCH.BINARY_OP_DIFF_TYPES] Cannot resolve "(timestamp + offsetmillis)" due to data type mismatch: the left and right operands of the binary operator have incompatible types ("TIMESTAMP" and "BIGINT")

However I'm not sure how I can best cast this to a duration or interval.

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Databricks Employee
Databricks Employee

Hi @Ivo Merchiers​ ,

Here is how I did it. As you mentioned, I am considering a date with milliseconds as input in "ts" column and offset to be added in "offSetMillis" column. First of all, I converted the "ts" column to milliseconds and then added "offSetMillis" to it and finally converted this new value back to timestamp in "new_ts" column

Screenshot 2023-02-06 at 6.50.51 PM

View solution in original post

4 REPLIES 4

Lakshay
Databricks Employee
Databricks Employee

Hi @Ivo Merchiers​ , If you are just trying to create a date with milliseconds, you can create them directly by providing the value in datetime as below.

Screenshot 2023-02-04 at 12.28.02 AMHowever, if your usecase is to add milliseconds to the date value then you have to convert the date to milliseconds before adding milliseconds to it.

Merchiv
New Contributor III

Hi @Lakshay Goel​,

I've just added the `spark.createDataFrame` command here as an example, the real data is coming from some existing tables, so I can't do it in the python initialisation.

I want to do the addition of some milliseconds (in integer/long/whatever) format to a timestamp (which should already have milliseconds precision) in Pyspark.

How would I go about doing the second approach you proposed?

Lakshay
Databricks Employee
Databricks Employee

Hi @Ivo Merchiers​ ,

Here is how I did it. As you mentioned, I am considering a date with milliseconds as input in "ts" column and offset to be added in "offSetMillis" column. First of all, I converted the "ts" column to milliseconds and then added "offSetMillis" to it and finally converted this new value back to timestamp in "new_ts" column

Screenshot 2023-02-06 at 6.50.51 PM

Merchiv
New Contributor III

Although @Lakshay Goel​'s solution works, we've been using an alternative approach, that we found to be a bit more readable:

from pyspark.sql import Column, functions as f
 
 
def make_dt_interval_sec(col: Column):
    return f.expr(f"make_dt_interval(0,0,0,{col._jc.toString()})")
 
df.withColumn(
      start_col,
        f.col("timestamp") - make_dt_interval_sec(f.col("offsetmillis") / 1000),
     )

I'm not sure if there is any performance difference between both methods.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group