<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Pyspark to_date not coping with single digit Day or Month in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98318#M39687</link>
    <description>&lt;P&gt;Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 so&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="RobDineen_0-1731324661487.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12806i19018F2EF7F5AE55/image-size/medium?v=v2&amp;amp;px=400" role="button" title="RobDineen_0-1731324661487.png" alt="RobDineen_0-1731324661487.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;is there a nice easy way to get round this at all&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Rob&lt;/P&gt;</description>
    <pubDate>Mon, 11 Nov 2024 11:31:48 GMT</pubDate>
    <dc:creator>RobDineen</dc:creator>
    <dc:date>2024-11-11T11:31:48Z</dc:date>
    <item>
      <title>Pyspark to_date not coping with single digit Day or Month</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98318#M39687</link>
      <description>&lt;P&gt;Hi there i have a simple Pyspark To_date function but fails due to days or months from 1-9 so&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="RobDineen_0-1731324661487.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12806i19018F2EF7F5AE55/image-size/medium?v=v2&amp;amp;px=400" role="button" title="RobDineen_0-1731324661487.png" alt="RobDineen_0-1731324661487.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;is there a nice easy way to get round this at all&lt;/P&gt;&lt;P&gt;Regards&lt;/P&gt;&lt;P&gt;Rob&lt;/P&gt;</description>
      <pubDate>Mon, 11 Nov 2024 11:31:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98318#M39687</guid>
      <dc:creator>RobDineen</dc:creator>
      <dc:date>2024-11-11T11:31:48Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark to_date not coping with single digit Day or Month</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98319#M39688</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/126224"&gt;@RobDineen&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;You may try setting the timeParserPolicy to meet your use case needs.&lt;/P&gt;
&lt;DIV&gt;
&lt;P&gt;When LEGACY, java.text.SimpleDateFormat is used for formatting and parsing&amp;nbsp;dates/timestamps in a locale-sensitive manner, which is the approach before Spark 3.0.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;When set to CORRECTED, classes from java.time.* packages are used for the same purpose.&amp;nbsp;The default value is EXCEPTION, RuntimeException is thrown when we will get different&amp;nbsp;results.&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;spark.conf.set("spark.sql.legacy.timeParserPolicy","LEGACY")&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;or&lt;/SPAN&gt;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;spark.sql("set spark.sql.legacy.timeParserPolicy=LEGACY")&lt;/STRONG&gt;&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Mon, 11 Nov 2024 11:52:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98319#M39688</guid>
      <dc:creator>VZLA</dc:creator>
      <dc:date>2024-11-11T11:52:10Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark to_date not coping with single digit Day or Month</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98333#M39698</link>
      <description>&lt;P&gt;i have been trying to solve it with the following New column on the fly,&amp;nbsp;&amp;nbsp;&lt;/P&gt;&lt;P&gt;if DayofMonth in (&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;4&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;5&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;6&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;7&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;8&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;9) then put a 0 before, else leave as is.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="RobDineen_0-1731332791231.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12809iAD174A8FBB134BF4/image-size/medium?v=v2&amp;amp;px=400" role="button" title="RobDineen_0-1731332791231.png" alt="RobDineen_0-1731332791231.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;obviously I'm trying to insert the 0 incorrectly. but wondering how?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;nearly there&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="RobDineen_1-1731333144746.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/12810i8031007D71A67841/image-size/medium?v=v2&amp;amp;px=400" role="button" title="RobDineen_1-1731333144746.png" alt="RobDineen_1-1731333144746.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Nov 2024 13:52:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98333#M39698</guid>
      <dc:creator>RobDineen</dc:creator>
      <dc:date>2024-11-11T13:52:35Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark to_date not coping with single digit Day or Month</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98423#M39718</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/34618"&gt;@VZLA&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;any idea with the below work around, I'm nearly there.&lt;/P&gt;</description>
      <pubDate>Tue, 12 Nov 2024 09:53:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98423#M39718</guid>
      <dc:creator>RobDineen</dc:creator>
      <dc:date>2024-11-12T09:53:22Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark to_date not coping with single digit Day or Month</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98694#M39804</link>
      <description>&lt;P&gt;Resolved using format_string&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;dff = df.withColumn(&lt;/SPAN&gt;&lt;SPAN&gt;"DayofMonthFormatted"&lt;/SPAN&gt;&lt;SPAN&gt;, when(df.DayofMonth.isin([&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;4&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;5&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;6&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;7&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;8&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;SPAN&gt;9&lt;/SPAN&gt;&lt;SPAN&gt;]), format_string(&lt;/SPAN&gt;&lt;SPAN&gt;"0%d"&lt;/SPAN&gt;&lt;SPAN&gt;, df.DayofMonth)).otherwise(df.DayofMonth))&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 13 Nov 2024 15:45:18 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-to-date-not-coping-with-single-digit-day-or-month/m-p/98694#M39804</guid>
      <dc:creator>RobDineen</dc:creator>
      <dc:date>2024-11-13T15:45:18Z</dc:date>
    </item>
  </channel>
</rss>

