best practices for implementing early arriving fact handling
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2024 03:18 AM - edited 08-27-2024 03:21 AM
Hi All,
Can you please share us the best practices for implementing early arriving fact handling in databricks for streaming data processed in near real time using structured streaming.
There are many ways to handle this use case in batch/mini batch. Specially we are looking for best practices to handle this use case using structured streaming in near real time.
example:
Example of early arriving fact:
Please refer to the below tables explaining early arriving fact scenarios.
- One record is received (highlighted in red) in SalesDetail transaction data where corresponding customer (C4) is not loaded into DimCustomer dimension yet.
- The data for fact (FactSalesDetail) table arrived earlier than corresponding dimension (C4 in DimCustomer) data.
Regards,
Phani
- Labels:
-
Delta
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-28-2024 07:41 AM
Greetings Team, I would like to inquire if any of you have suggestions regarding the query.

