cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

APPLY INTO Highdate instead of NULL for __END_AT

c-thiel
New Contributor

I really like the APPLY INTO function to keep track of changes and historize them in SCD2.
However, I am a bit confused that current records get an __END_AT of NULL. Typically, __END_AT should be a highgate (i.e. 9999-12-31) or similar, so that a point in time query can simply use:

SELECT * FROM TABLE where '<my-date>' BETWEEN __START_AT and __END_AT

Right now I believe I have to use COALESCE on __END_AT to get the correct results. Is there an option to set a highgate instead of NULL for __END_AT?

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

The APPLY INTO function for SCD2 historization typically sets the __END_AT field of current records to NULL rather than a "highgate" like 9999-12-31. This is by design and reflects that the record is still current and has no defined end date yet.

Currently, there is no built-in option in APPLY INTO to set a highgate value such as 9999-12-31 in place of NULL for the __END_AT field. The common and recommended practice in querying such SCD2 tables is to use COALESCE on __END_AT during point-in-time queries, for example:

sql
SELECT * FROM table WHERE '<my-date>' BETWEEN __START_AT AND COALESCE(__END_AT, '9999-12-31')

This approach avoids assuming a default date on data insertion and clearly specifies that NULL __END_AT means "open-ended" or current record.

In summary:

  • APPLY INTO currently sets __END_AT to NULL for current records (no explicit "highgate").

  • There is no native setting to replace NULL with a highgate date in APPLY INTO.

  • Using COALESCE in your queries is the standard way to handle the NULL __END_AT in point-in-time filtering.

This design aligns with Delta Live Tables and Databricks implementations where NULL __END_AT signifies an active record's open-ended coverage, and your query patterns accommodate that appropriately.​