cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Help design my streaming pipeline

rt-slowth
Contributor

###Data Source
- AWS RDS
- Database migration tasks have been created using AWS DMS
- Relevant cdc information is being stored in a specific bucket in S3

### Data frequency
- Once a day (but not sure when, sometime after 6pm)

### Development environment
- databricks
- Delat Live Table from databricks

### Data Status
- CLOSE_DT, CURR_F_CD, CURR_T_CD are PK, JOIN conditions
- CLOSE_DT is DATE type
- Data comes in from source(=RDS) once a day on weekdays.
- This data is written as a cdc to S3 via AWS DMS

### Processing requirements
- No data comes into source on non-weekday holidays, but must be matched to the most recent data.
- Data comes in once a day on weekdays, and the presence or absence of a specific CLOSE_DT can be used to determine if data came in today or not.
- For example, let's say today is 2023-12-28.
- You don't know when data with a CLOSE_DT of 2023-12-28 will come in today.
- So until the data comes in, you create the 2023-12-28 data from the most recent 2023-12-27 data.
- When the 2023-12-28 data comes in, the data is swapped.
- No data comes in at all on holidays, so data must be generated with the most recent data each day

0 REPLIES 0
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.