from_utc_time gives strange results
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-27-2025 02:15 AM
I don't understand why from_utc_time(col("original_time"), "Europe/Berlin") changes the timestamp instead of just setting the timezone. That's a non-intuitive behaviour.
spark.conf.set("spark.sql.session.timeZone", "UTC")
from pyspark.sql import Row
from pyspark.sql.types import StructType, StructField, TimestampType,StringType
from pyspark.sql.functions import col,from_utc_timestamp,unix_timestamp
data = [Row(Zeit="1970-01-01 00:00")]
schema = StructType([StructField("original_time", StringType(), True)])
df = spark.createDataFrame(data, schema)
df = df.withColumn("original_time", col("original_time").cast("timestamp"))
df = df.withColumn("original_time_int",unix_timestamp(col("original_time")))
df = df.withColumn("berlin_time", from_utc_timestamp(col("original_time"), "Europe/Berlin"))
df = df.withColumn("berlin_time_int", unix_timestamp(col("berlin_time")))
display(df)
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hello @nielsehlers!
Just to clarify, PySpark's from_utc_timestamp converts a UTC timestamp to the specified timezone (in this case it's Europe/Berlin), adjusting the actual timestamp value rather than just setting timezone metadata. This happens because PySpark timestamps are stored as absolute instants (epoch time) without timezone information, and the function recalculates it in the given timezone.
For more info: pyspark.sql.functions.from_utc_timestamp

