cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Automatic conversion of timestamp to the default timezone

ata_lh
New Contributor II

I am encountering the issue when ingesting data from adls xml or json files to process them via Pyspark (Autoloader or just reading df). The timestamp is automatically converted to the default timezone.And I have  dynamically timezone values. Did anyone of you has found a way how to not let the conversion happened ? I tried to set the 

spark.conf.set("spark.sql.session.timeZone", "UTC"), but it is not working. 
Original local time: 2024-06-21 20:50:00
the Offset: +08:00 : This means the local time is 8 hours ahead of UTC.
Subtract 8 hours: 2024-06-21 20:50:00 - 8:00:00 = 2024-06-21 12:50:00
Result in UTC: 2024-06-21T12:50:00Z
1 REPLY 1

ata_lh
New Contributor II

Hi @Retired_mod  , 

The point is that in the aim of our project, we need the timestamp attribute to be as they are from the source system. So basically our aim would be to have the attribute without the timezone conversion. 

I did the below tests so far:

1. during ingestion using the "cloudFiles.schemaHints" which cast all timestamp to STRING

2. cast the string to the TIMESTAMP_NTZ. But since i have some attributes containing only the time, not the date included when casting it gives null. 

Is there any possible option that we can disable the automatic conversion when the schema is inferred?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group