<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Error in Databricks code? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18647#M12414</link>
    <description>&lt;P&gt;yes, it should work similarly, thanks.&lt;/P&gt;&lt;P&gt;cast('int') vs cast(IntegerType()), I suppose both are identical?&lt;/P&gt;</description>
    <pubDate>Wed, 07 Dec 2022 02:31:58 GMT</pubDate>
    <dc:creator>THIAM_HUATTAN</dc:creator>
    <dc:date>2022-12-07T02:31:58Z</dc:date>
    <item>
      <title>Error in Databricks code?</title>
      <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18641#M12408</link>
      <description>&lt;P&gt;&lt;A href="https://www.databricks.com/notebooks/recitibikenycdraft/data-preparation.html" target="test_blank"&gt;https://www.databricks.com/notebooks/recitibikenycdraft/data-preparation.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Could someone help to see in that &lt;B&gt;Step 3: Prepare Calendar Info&lt;/B&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;# derive complete list of dates between first and last dates&lt;/P&gt;&lt;P&gt;dates = (&lt;/P&gt;&lt;P&gt;  spark&lt;/P&gt;&lt;P&gt;    .range(0,days_between).withColumnRenamed('id','days')&lt;/P&gt;&lt;P&gt;    .withColumn('init_date', lit(first_date))&lt;/P&gt;&lt;P&gt;    .selectExpr('cast(date_add(init_date, days) as timestamp) as date')&lt;/P&gt;&lt;P&gt;  )&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;what does 'days' refer to in the last selectExpr sentence? it seems to be me not defined. is it meant to be 'days_between'? If i replace 'days' with 'days_between', it also breaks, because it expects a integer value, and not a variable.&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2022 09:26:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18641#M12408</guid>
      <dc:creator>THIAM_HUATTAN</dc:creator>
      <dc:date>2022-12-05T09:26:56Z</dc:date>
    </item>
    <item>
      <title>Re: Error in Databricks code?</title>
      <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18642#M12409</link>
      <description>&lt;P&gt;@THIAM HUAT TAN​&amp;nbsp;&lt;/P&gt;&lt;P&gt;You are renaming the "id" column to "days", that days column should have a integer value. and in the selectExpr you are doing a date addtion of init_date and the values from days column. &lt;/P&gt;&lt;P&gt;I have recreated the same with a sample data set. &lt;/P&gt;&lt;P&gt;Note : Kindly check whether the "id" column is a integer in your case.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="snap_sample_rec"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1046i7C95BBC63D23DA76/image-size/large?v=v2&amp;amp;px=999" role="button" title="snap_sample_rec" alt="snap_sample_rec" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2022 14:00:59 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18642#M12409</guid>
      <dc:creator>Harun</dc:creator>
      <dc:date>2022-12-05T14:00:59Z</dc:date>
    </item>
    <item>
      <title>Re: Error in Databricks code?</title>
      <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18643#M12410</link>
      <description>&lt;P&gt;Hi @THIAM HUAT TAN​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In your notebook, you are creating a integer column days_between with the code&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;days_between = (last_date - first_date).days + 10&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Logically speaking, what the nb trying to do is to fetch all the dates between two dates to do a forecast.&lt;/P&gt;&lt;P&gt;So, to get a complete list of dates between the start date and x number of days into the future (x = days_between), you are using spark.range(0,days_between). What is does is that it fetches a column of integers just like how python range works. The name of the column would be by default ''id".&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1043iD985C7905F9656AB/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;So you are renaming this id column to "days" just for simplicity to denote that you are adding x days to another reference column (which again you are creating by using .withColumn('init_date', lit(first_date))) and creating a new column date which contains the date calculated. &lt;/P&gt;&lt;P&gt;The date calculated would be nothing but  first_date + days.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hope this helps...&lt;/P&gt;&lt;P&gt;Cheers..&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 05 Dec 2022 17:31:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18643#M12410</guid>
      <dc:creator>UmaMahesh1</dc:creator>
      <dc:date>2022-12-05T17:31:12Z</dc:date>
    </item>
    <item>
      <title>Re: Error in Databricks code?</title>
      <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18644#M12411</link>
      <description>&lt;P&gt;thanks Harun for your example.. however, I am still stuck.. initially I have thought it is due to the date with time that interfere with the date_add, and I make it with to_date, however, the same error message appear.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1042i92B4A340CB03CBBA/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;With the next statement, it gives error below:&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1049i7D835F7F7F98B7E5/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;Appreciate if you could point me to the error. Thanks.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2022 02:08:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18644#M12411</guid>
      <dc:creator>THIAM_HUATTAN</dc:creator>
      <dc:date>2022-12-06T02:08:44Z</dc:date>
    </item>
    <item>
      <title>Re: Error in Databricks code?</title>
      <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18645#M12412</link>
      <description>&lt;P&gt;with this line added, then the error disappears &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/1047i2C1B31AA68840454/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2022 02:38:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18645#M12412</guid>
      <dc:creator>THIAM_HUATTAN</dc:creator>
      <dc:date>2022-12-06T02:38:37Z</dc:date>
    </item>
    <item>
      <title>Re: Error in Databricks code?</title>
      <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18646#M12413</link>
      <description>&lt;P&gt;@THIAM HUAT TAN​&amp;nbsp;Kindly cast the "id" column to IntegerType like how i have casted for "arrival_date" column in my above sample record snapshot. That will make the code work.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Dec 2022 09:41:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18646#M12413</guid>
      <dc:creator>Harun</dc:creator>
      <dc:date>2022-12-06T09:41:16Z</dc:date>
    </item>
    <item>
      <title>Re: Error in Databricks code?</title>
      <link>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18647#M12414</link>
      <description>&lt;P&gt;yes, it should work similarly, thanks.&lt;/P&gt;&lt;P&gt;cast('int') vs cast(IntegerType()), I suppose both are identical?&lt;/P&gt;</description>
      <pubDate>Wed, 07 Dec 2022 02:31:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/error-in-databricks-code/m-p/18647#M12414</guid>
      <dc:creator>THIAM_HUATTAN</dc:creator>
      <dc:date>2022-12-07T02:31:58Z</dc:date>
    </item>
  </channel>
</rss>

