<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: TimestampFormat issue in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/timestampformat-issue/m-p/34823#M25540</link>
    <description>&lt;P&gt;You must have solved this issue by now but for the sake of those that encounter this again, here's the solution that worked for me:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 28 Mar 2023 20:25:13 GMT</pubDate>
    <dc:creator>searchs</dc:creator>
    <dc:date>2023-03-28T20:25:13Z</dc:date>
    <item>
      <title>TimestampFormat issue</title>
      <link>https://community.databricks.com/t5/data-engineering/timestampformat-issue/m-p/34822#M25539</link>
      <description>&lt;P&gt;The databricks notebook failed yesterday due to timestamp format issue. &lt;/P&gt;&lt;P&gt;error:&lt;/P&gt;&lt;P&gt;"SparkUpgradeException: You may get a different result due to the upgrading of Spark 3.0: Fail to parse '2022-08-10 00:00:14.2760000' in the new parser. You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0, or set to CORRECTED and treat it as an invalid datetime string.&lt;/P&gt;&lt;P&gt;"&lt;/P&gt;&lt;P&gt;The notebook has been running fine before. e.g. we have "2022-08-07T23:59:57.9740000" kind of timestamp values in the ts column.&lt;/P&gt;&lt;P&gt;We are using explicit timestampformat 'yyyy-MM-dd HH:mm:ss.SSS' when rreading the csv files.&lt;/P&gt;&lt;P&gt;However, we started getting null in the timestamp values as the values could not be converted.&lt;/P&gt;&lt;P&gt;So i changed the format to 'yyyy-MM-dd HH:mm:ss.SSSSSSS' and it worked for one of the objects. But the issue remains for another object.&lt;/P&gt;&lt;P&gt;However,&lt;/P&gt;&lt;P&gt;When i completely removed the timestampFormat option it worked for this last object.&lt;/P&gt;&lt;P&gt;I am wondering what changed on the databricks cluster that it started failing. The timestamp values in the files are in the same format as before.&lt;/P&gt;&lt;P&gt;Here is the function without the timestampFormat option that works.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def ReadRawCSV(filesToProcess,header,delimiter,schema_struct):
  delta_df = spark.read.options(header=header,delimiter=delimiter).schema(schema_struct).csv(filesToProcess)
  return delta_df&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 11 Aug 2022 05:31:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/timestampformat-issue/m-p/34822#M25539</guid>
      <dc:creator>irfanaziz</dc:creator>
      <dc:date>2022-08-11T05:31:58Z</dc:date>
    </item>
    <item>
      <title>Re: TimestampFormat issue</title>
      <link>https://community.databricks.com/t5/data-engineering/timestampformat-issue/m-p/34823#M25540</link>
      <description>&lt;P&gt;You must have solved this issue by now but for the sake of those that encounter this again, here's the solution that worked for me:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 28 Mar 2023 20:25:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/timestampformat-issue/m-p/34823#M25540</guid>
      <dc:creator>searchs</dc:creator>
      <dc:date>2023-03-28T20:25:13Z</dc:date>
    </item>
  </channel>
</rss>

