cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

best practices for implementing early arriving fact handling

Phani1
Valued Contributor II

 

Hi All,

Can you please share us the best practices for implementing early arriving fact handling in databricks for streaming data processed in near real time using structured streaming.

There are many ways to handle this use case in batch/mini batch. Specially we are looking for best practices to handle this use case using structured streaming in near real time.

example:

Phani1_0-1724754033290.png

 

 

Example of early arriving fact:

Please refer to the below tables explaining early arriving fact scenarios.

  • One record is received (highlighted in red) in SalesDetail transaction data where corresponding customer (C4) is not loaded into DimCustomer dimension yet.
  • The data for fact (FactSalesDetail) table arrived earlier than corresponding dimension (C4 in DimCustomer) data.

Regards,

Phani

1 REPLY 1

Phani1
Valued Contributor II

Greetings Team, I would like to inquire if any of you have suggestions regarding the query.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group