- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2022 09:31 AM
Hi @THIAM HUAT TAN
In your notebook, you are creating a integer column days_between with the code
days_between = (last_date - first_date).days + 10Logically speaking, what the nb trying to do is to fetch all the dates between two dates to do a forecast.
So, to get a complete list of dates between the start date and x number of days into the future (x = days_between), you are using spark.range(0,days_between). What is does is that it fetches a column of integers just like how python range works. The name of the column would be by default ''id".
So you are renaming this id column to "days" just for simplicity to denote that you are adding x days to another reference column (which again you are creating by using .withColumn('init_date', lit(first_date))) and creating a new column date which contains the date calculated.
The date calculated would be nothing but first_date + days.
Hope this helps...
Cheers..