cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Automatic conversion of timestamp to the default timezone

ata_lh
New Contributor II

I am encountering the issue when ingesting data from adls xml or json files to process them via Pyspark (Autoloader or just reading df). The timestamp is automatically converted to the default timezone.And I have  dynamically timezone values. Did anyone of you has found a way how to not let the conversion happened ? I tried to set the 

spark.conf.set("spark.sql.session.timeZone", "UTC"), but it is not working. 
Original local time: 2024-06-21 20:50:00
the Offset: +08:00 : This means the local time is 8 hours ahead of UTC.
Subtract 8 hours: 2024-06-21 20:50:00 - 8:00:00 = 2024-06-21 12:50:00
Result in UTC: 2024-06-21T12:50:00Z
2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @ata_lh

  1. You mentioned that you tried setting the timezone using spark.conf.set("spark.sql.session.timeZone", "UTC"). This should work, but make sure you set it before reading your data.
  2. If your input data contains a timestamp column with timezone information, you can use from_utc_timestamp to convert it to a specific timezone.
  3. Alternatively, you can set the session timezone to the desired one (e.g., ‘America/New_York’) and use to_timestamp.
  4. While UDFs can be useful for custom logic, the above approaches should work without them.If you encounter any issues, feel free to ask for further assistance! 

ata_lh
New Contributor II

Hi @Kaniz_Fatma  , 

The point is that in the aim of our project, we need the timestamp attribute to be as they are from the source system. So basically our aim would be to have the attribute without the timezone conversion. 

I did the below tests so far:

1. during ingestion using the "cloudFiles.schemaHints" which cast all timestamp to STRING

2. cast the string to the TIMESTAMP_NTZ. But since i have some attributes containing only the time, not the date included when casting it gives null. 

Is there any possible option that we can disable the automatic conversion when the schema is inferred?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group