cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Knowledge Sharing Hub
Dive into a collaborative space where members like YOU can exchange knowledge, tips, and best practices. Join the conversation today and unlock a wealth of collective wisdom to enhance your experience and drive success.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Build & Refresh a Calendar Dates Table

ThierryBa
New Contributor

Introduction

Maintaining accurate and up-to-date calendar date tables is crucial for reliable reporting, yet manual updates can be time-consuming and prone to error. This fundamental component serves as the backbone for date-based analysis, enabling accurate reporting across various date dimensions. Whether you are tracking sales trends, analysing employee performance, or forecasting future metrics, a robust calendar dates table is essential for deriving meaningful insights from your data.

Databricks, with its SQL engine, offers an ideal environment for creating and maintaining such a crucial element. By leveraging Databricksโ€™ capabilities, organisations can automate the process of generating and updating their calendar dates table.

In this blog post, we will explore how to harness Databricksโ€™ serverless SQL and workflow (scheduling) functionalities to build and refresh a calendar dates table tailored for Business Intelligence applications.

We will cover the creation of the table itself, setting up a workflow. By the end, you will have a clear understanding of how to implement this solution.

Create the date table

Previously, we used to leverage a Python-driven notebook to build and refresh our data. We decided to now leverage SQL instead of Python as most of our customers are more at ease with the latter.

First things first, we need to define the start and end date of our table. In order to do so, we will leverage the SQL function sequence. Long story short, the sequence function is used to create an array of elements from start to stop and incrementing by a step. The below code will generate an array containing dates from January 1st 2020, to January 1st 2050.

ThierryBa_0-1726544617449.png

 

Now the second step is to literally explode that array into proper rows. In order to do so, we will leverage the SQL function explode. Basically, this function will return a set of rows by un-nesting our array (collection).

ThierryBa_1-1726544617449.png

 

In two very easy and simple steps we now have the foundation of our date dimension.

Next steps is to capitalise on those two steps and start creating various date related columns based on the value of the calendar_date column. To do so, we will be using a CTE (Common table Expression) definition to factorise and reuse our code in a neat manner.

Now comes the most interesting part which is defining and adding the Business Intelligence relevant date attributes. The below query enriches our table with additional attributes useful for BI reporting, including fiscal periods (assuming a fiscal year starting April 1).

Many of the built-in SQL functions available in Databricks will look very familiar to experienced data engineers who have worked with on-premises relational database management systems (RDBMS) like Oracle, SQL Server, or PostgreSQL. This similarity eases the transition for those moving from traditional data warehousing to Databricks.

For instance, in our calendar dimension example, functions such as year(), month(), dayofweek(), and weekofyear() are nearly identical in syntax and functionality to their counterparts in traditional RDBMS systems. This allows data engineers to leverage their existing SQL knowledge when building date-related dimensions and fact tables in Databricks.

with calendarDates as (

  select

    explode(array_dates) as calendar_date

  from

    (

      select

        sequence(

          make_date(2020, 01, 01),

          make_date(2050, 01, 01),

          interval 1 day

        ) as array_dates

    )

)

select

  10000 * year(calendar_date) + 100 * month(calendar_date) + day(calendar_date) as calendar_date_key,

  to_date(calendar_date) as calendar_date,

  year(calendar_date) calendar_year,

  month(calendar_date) calendar_month,

  day(calendar_date) calendar_day_of_month,

  date_format(calendar_date, 'EEEE MMMM dd yyyy') calendar_date_full,

  date_format(calendar_date, 'EEEE') calendar_day_name,

  date_add(to_date(calendar_date), -1) as calendar_date_previous_day,

  date_add(to_date(calendar_date), 1) as calendar_date_next_day,

  case

    when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)

    else date_add(calendar_date, -(weekday(calendar_date)))

  end as calendar_week_start,

  date_add(

    case

      when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)

      else date_add(calendar_date, -(weekday(calendar_date)))

    end,

    6

  ) as calendar_week_end,

  weekday(calendar_date) + 1 as calendar_week_day,

  weekofyear(calendar_date) calendar_week_of_year,

  date_format(calendar_date, 'MMMM yyyy') calendar_month_year,

  date_format(calendar_date, 'MMMM') calendar_month_name,

  date_add(last_day(add_months(calendar_date, -1)), 1) calendar_first_day_of_month,

  last_day(calendar_date) calendar_last_day_of_month,

  case

    when month(calendar_date) in (1, 2, 3) then 1

    when month(calendar_date) in (4, 5, 6) then 2

    when month(calendar_date) in (7, 8, 9) then 3

    else 4

  end AS fiscal_quarter,

  year(date_add(calendar_date, 89)) AS fiscal_year,

  case

    when to_date(now()) = calendar_date then true

    else false

  end as current_day,

  CASE

    WHEN to_date(now()) BETWEEN (

      case

        when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)

        else date_add(calendar_date, -(weekday(calendar_date)))

      end

    )

    AND (

      date_add(

        case

          when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)

          else date_add(calendar_date, -(weekday(calendar_date)))

        end,

        6

      )

    )

    THEN true

    else false

  end as current_week,

  case

    when month(to_date(now())) = month(calendar_date)

    and year(to_date(now())) = year(calendar_date) then true

    else false

  end as current_month,

  case

    when year(to_date(now())) = year(calendar_date) then true

    else false

  end as current_year

from

  calendarDates

limit

  10;

Create our notebook

Now that we defined our SQL, we need to encapsulate it inside a Create Table statement, then we will parametrise our notebook to pass 2 parameters (widgets) defining which catalog and schema to use.

We will put that in motion directly in a Notebook as per the below:

ThierryBa_2-1726544617450.png

 

The full notebook can be accessed here:

-- first let's set the timezone to the correct location. UTC is behind NZST or NZDT.
SET timezone = Pacific/Auckland;

-- let's use the right catalog and schema passed as parameters
use catalog ${catalog_name};
use schema ${schema_name};

-- create a managed table in our Unity Catalog
create or replace table calendar as
-- CTE to simplify our SQL
with calendarDates as (
select
explode(array_dates) as calendar_date
from
(
select
sequence(
make_date(2020, 01, 01), -- start date
make_date(2050, 01, 01), -- end date
interval 1 day -- incremental step
) as array_dates
)
)
-- the SQL transforming our main calendar_date into all the colunmns we will be requiring
select
10000 * year(calendar_date) + 100 * month(calendar_date) + day(calendar_date) as calendar_date_key, -- calendar table primary key
to_date(calendar_date) as calendar_date,
year(calendar_date) calendar_year,
month(calendar_date) calendar_month,
day(calendar_date) calendar_day_of_month,
date_format(calendar_date, 'EEEE MMMM dd yyyy') calendar_date_full,
date_format(calendar_date, 'EEEE') calendar_day_name,
date_add(to_date(calendar_date), -1) as calendar_date_previous_day,
date_add(to_date(calendar_date), 1) as calendar_date_next_day,
case
when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)
else date_add(calendar_date, -(weekday(calendar_date)))
end as calendar_week_start, -- start of week date
date_add(
case
when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)
else date_add(calendar_date, -(weekday(calendar_date)))
end,
6
) as calendar_week_end, -- end of week date
weekday(calendar_date) + 1 as calendar_week_day,
weekofyear(calendar_date) calendar_week_of_year,
date_format(calendar_date, 'MMMM yyyy') calendar_month_year,
date_format(calendar_date, 'MMMM') calendar_month_name,
date_add(last_day(add_months(calendar_date, -1)), 1) calendar_first_day_of_month,
last_day(calendar_date) calendar_last_day_of_month,
case
when month(calendar_date) in (1, 2, 3) then 1
when month(calendar_date) in (4, 5, 6) then 2
when month(calendar_date) in (7, 8, 9) then 3
else 4
end AS fiscal_quarter,
year(date_add(calendar_date, 89)) AS fiscal_year,
case
when to_date(now()) = calendar_date then true
else false
end as current_day, -- true if calendar day is today
CASE
WHEN to_date(now()) BETWEEN (
case
when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)
else date_add(calendar_date, -(weekday(calendar_date)))
end
)
AND (
date_add(
case
when date_add(calendar_date, (weekday(calendar_date) + 1) -1) = calendar_date then to_date(calendar_date)
else date_add(calendar_date, -(weekday(calendar_date)))
end,
6
)
) THEN true
else false
end as current_week, -- true if calendar day is in current week
case
when month(to_date(now())) = month(calendar_date)
and year(to_date(now())) = year(calendar_date) then true
else false
end as current_month, -- true if calendar day is in current month
case
when year(to_date(now())) = year(calendar_date) then true
else false
end as current_year -- true is calendar days is in current year
from
calendarDates
-- modify our calendar table in order to set a PK (primary key)
ALTER TABLE calendar
ALTER COLUMN calendar_date_key SET NOT NULL;
alter table calendar add primary key (calendar_date_key);

Next steps:

  • Modify the notebook to add the columns that you need for your use case.
  • Test and test.
  • Create a workflow to schedule and run the notebook.
  • Create Dimension views of your calendar and start using it!
Data and Analytics Practice Lead
3 REPLIES 3

AdamNevin
New Contributor

Very helpful, thanks for sharing! 

ConnorK
New Contributor II

Thanks, automating tasks like these can be a life saver ๐Ÿ˜…

may-tun
New Contributor

Nice article, very informative! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group