cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
cancel
Showing results for 
Search instead for 
Did you mean: 

to_utc_timestamp is subtracting a different time delta and from_utc_timestamp is not adding the same delta.

rahullalwani25
New Contributor II

My session timezone is Australia/Sydney

if i run the below query my expectation is first column and third column should show the same value. But it is not working as expected for 1753-01-01 00:00:00 timestamp.

spark.conf.set("spark.sql.session.timeZone", "Australia/Sydney")

select '1753-01-01 00:0:00' original_timestamp,to_utc_timestamp('1753-01-01 00:0:00', "Australia/Sydney") to_utc_from_aest ,from_utc_timestamp(to_utc_timestamp('1753-01-01 00:0:00', "Australia/Sydney"),"Australia/Sydney") from_utc_to_aest

UNION all

select '2023-01-01 00:0:00',to_utc_timestamp('2023-01-01 00:0:00', "Australia/Sydney"),from_utc_timestamp(to_utc_timestamp('2023-01-01 00:0:00', "Australia/Sydney"),"Australia/Sydney")

image.png

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @Rahul Lalwani​, The issue you're facing with the timestamp’ 1753-01-01 00:00:00’ is related to a limitation in the date range supported by Spark. Spark's default date range starts from 1900, and any earlier date may not be handled accurately.

You can use a different timestamp within Spark's supported date range to work around this limitation. Here's an updated version of your query that uses ’1900-01-01 00:00:00’ instead of ’1753-01-01 00:00:00’:

spark.conf.set("spark.sql.session.timeZone", "Australia/Sydney")
 
SELECT
    '1900-01-01 00:00:00' AS original_timestamp,
    to_utc_timestamp('1900-01-01 00:00:00', 'Australia/Sydney') AS to_utc_from_aest,
    from_utc_timestamp(to_utc_timestamp('1900-01-01 00:00:00', 'Australia/Sydney'), 'Australia/Sydney') AS from_utc_to_aest
 
UNION ALL
 
SELECT
    '2023-01-01 00:00:00',
    to_utc_timestamp('2023-01-01 00:00:00', 'Australia/Sydney'),
    from_utc_timestamp(to_utc_timestamp('2023-01-01 00:00:00', 'Australia/Sydney'), 'Australia/Sydney')

By using ’1900-01-01 00:00:00’, you should see the first and third columns showing the same value in the result.

Hello Kaniz,

It's working till 1896-01-01 00:00:00 which rules out the spark's supported date range issue.

Secondly I don't want the accurate timestamp but time delta subtracted in to_utc_timestamp should be the exact value being added in from_utc_timestamp or vice versa but in the snippets you can see there is 5 minutes +- issue.

image.png 

 P.S. It works correctly on job cluster and sql warehouse. This issue is only in interactive cluster.

Vartika
Moderator
Moderator

Hi @Rahul Lalwani​,

Does @Kaniz Fatma​'s response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you need more help. 

Thanks!

Pavithra_R
New Contributor II
New Contributor II

Hi @Rahul Lalwani​  (Customer)​,

In Interactive cluster spark.sql.datetime.java8API.enabled is disabled when we enable spark.sql.datetime.java8API.enabled to true , we can see crt values for 1753-01-01 as well.

The reason for enabling the above config is because in Date and timestamp string literals are parsed by using Java 8 time API and Spark's session time zone. Before the changes, date/timestamp values were collected as legacy types `java.sql.Date`/`java.sql.Timestamp`, and the value of such types didn't respect the config `spark.sql.session.timeZone`. To have consistent view, users had to keep JVM time zone and Spark's session time zone in sync.

imageSo please enable spark.sql.datetime.java8API.enabled to true for the correct time format.

And the reason for 5 mins difference is due to standardization of timezones till 1895 the offset was UTC+10:05 post that i.e from 1896 the offset is UTC+10:00.