<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: from_utc_time gives strange results in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/from-utc-time-gives-strange-results/m-p/114521#M44853</link>
    <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/155431"&gt;@nielsehlers&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;Just to clarify,&amp;nbsp;PySpark's &lt;EM&gt;from_utc_timestamp&lt;/EM&gt; converts a UTC timestamp to the specified timezone (in this case it's Europe/Berlin), adjusting the actual timestamp value rather than just setting timezone metadata. This happens because PySpark timestamps are stored as absolute instants (epoch time) without timezone information, and the function recalculates it in the given timezone.&lt;/P&gt;
&lt;P&gt;For more info:&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.from_utc_timestamp.html#pyspark-sql-functions-from-utc-timestamp" target="_self"&gt;pyspark.sql.functions.from_utc_timestamp&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Fri, 04 Apr 2025 13:22:39 GMT</pubDate>
    <dc:creator>Advika</dc:creator>
    <dc:date>2025-04-04T13:22:39Z</dc:date>
    <item>
      <title>from_utc_time gives strange results</title>
      <link>https://community.databricks.com/t5/data-engineering/from-utc-time-gives-strange-results/m-p/113787#M44633</link>
      <description>&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;I don't understand why from_utc_time(col("original_time"), "Europe/Berlin") changes the timestamp instead of just setting the timezone. That's a non-intuitive behaviour.&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark.conf.&lt;/SPAN&gt;&lt;SPAN&gt;set&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"spark.sql.session.timeZone"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"UTC"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; Row&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql.types &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; StructType, StructField, TimestampType,StringType&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql.functions &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; col,from_utc_timestamp,unix_timestamp&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;data &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [&lt;/SPAN&gt;&lt;SPAN&gt;Row&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;Zeit&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"1970-01-01 00:00"&lt;/SPAN&gt;&lt;SPAN&gt;)]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;schema &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;StructType&lt;/SPAN&gt;&lt;SPAN&gt;([&lt;/SPAN&gt;&lt;SPAN&gt;StructField&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"original_time"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;StringType&lt;/SPAN&gt;&lt;SPAN&gt;(), &lt;/SPAN&gt;&lt;SPAN&gt;True&lt;/SPAN&gt;&lt;SPAN&gt;)])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.&lt;/SPAN&gt;&lt;SPAN&gt;createDataFrame&lt;/SPAN&gt;&lt;SPAN&gt;(data, schema)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df.&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"original_time"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"original_time"&lt;/SPAN&gt;&lt;SPAN&gt;).&lt;/SPAN&gt;&lt;SPAN&gt;cast&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"timestamp"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df.&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"original_time_int"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;unix_timestamp&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"original_time"&lt;/SPAN&gt;&lt;SPAN&gt;)))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df.&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"berlin_time"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;from_utc_timestamp&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"original_time"&lt;/SPAN&gt;&lt;SPAN&gt;), &lt;/SPAN&gt;&lt;SPAN&gt;"Europe/Berlin"&lt;/SPAN&gt;&lt;SPAN&gt;))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df.&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"berlin_time_int"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;unix_timestamp&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;col&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"berlin_time"&lt;/SPAN&gt;&lt;SPAN&gt;)))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;display&lt;/SPAN&gt;&lt;SPAN&gt;(df)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 27 Mar 2025 09:15:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-utc-time-gives-strange-results/m-p/113787#M44633</guid>
      <dc:creator>nielsehlers</dc:creator>
      <dc:date>2025-03-27T09:15:49Z</dc:date>
    </item>
    <item>
      <title>Re: from_utc_time gives strange results</title>
      <link>https://community.databricks.com/t5/data-engineering/from-utc-time-gives-strange-results/m-p/114521#M44853</link>
      <description>&lt;P&gt;Hello&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/155431"&gt;@nielsehlers&lt;/a&gt;!&lt;/P&gt;
&lt;P&gt;Just to clarify,&amp;nbsp;PySpark's &lt;EM&gt;from_utc_timestamp&lt;/EM&gt; converts a UTC timestamp to the specified timezone (in this case it's Europe/Berlin), adjusting the actual timestamp value rather than just setting timezone metadata. This happens because PySpark timestamps are stored as absolute instants (epoch time) without timezone information, and the function recalculates it in the given timezone.&lt;/P&gt;
&lt;P&gt;For more info:&amp;nbsp;&lt;A href="https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.from_utc_timestamp.html#pyspark-sql-functions-from-utc-timestamp" target="_self"&gt;pyspark.sql.functions.from_utc_timestamp&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Fri, 04 Apr 2025 13:22:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/from-utc-time-gives-strange-results/m-p/114521#M44853</guid>
      <dc:creator>Advika</dc:creator>
      <dc:date>2025-04-04T13:22:39Z</dc:date>
    </item>
  </channel>
</rss>

