<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Help design my streaming pipeline in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/help-design-my-streaming-pipeline/m-p/55865#M30448</link>
    <description>&lt;P&gt;###Data Source&lt;BR /&gt;- AWS RDS&lt;BR /&gt;- Database migration tasks have been created using AWS DMS&lt;BR /&gt;- Relevant cdc information is being stored in a specific bucket in S3&lt;/P&gt;&lt;P&gt;### Data frequency&lt;BR /&gt;- Once a day (but not sure when, sometime after 6pm)&lt;/P&gt;&lt;P&gt;### Development environment&lt;BR /&gt;- databricks&lt;BR /&gt;- Delat Live Table from databricks&lt;/P&gt;&lt;P&gt;### Data Status&lt;BR /&gt;- CLOSE_DT, CURR_F_CD, CURR_T_CD are PK, JOIN conditions&lt;BR /&gt;- CLOSE_DT is DATE type&lt;BR /&gt;- Data comes in from source(=RDS) once a day on weekdays.&lt;BR /&gt;- This data is written as a cdc to S3 via AWS DMS&lt;/P&gt;&lt;P&gt;### Processing requirements&lt;BR /&gt;- No data comes into source on non-weekday holidays, but must be matched to the most recent data.&lt;BR /&gt;- Data comes in once a day on weekdays, and the presence or absence of a specific CLOSE_DT can be used to determine if data came in today or not.&lt;BR /&gt;- For example, let's say today is 2023-12-28.&lt;BR /&gt;- You don't know when data with a CLOSE_DT of 2023-12-28 will come in today.&lt;BR /&gt;- So until the data comes in, you create the 2023-12-28 data from the most recent 2023-12-27 data.&lt;BR /&gt;- When the 2023-12-28 data comes in, the data is swapped.&lt;BR /&gt;- No data comes in at all on holidays, so data must be generated with the most recent data each day&lt;/P&gt;</description>
    <pubDate>Thu, 28 Dec 2023 00:54:24 GMT</pubDate>
    <dc:creator>rt-slowth</dc:creator>
    <dc:date>2023-12-28T00:54:24Z</dc:date>
    <item>
      <title>Help design my streaming pipeline</title>
      <link>https://community.databricks.com/t5/data-engineering/help-design-my-streaming-pipeline/m-p/55865#M30448</link>
      <description>&lt;P&gt;###Data Source&lt;BR /&gt;- AWS RDS&lt;BR /&gt;- Database migration tasks have been created using AWS DMS&lt;BR /&gt;- Relevant cdc information is being stored in a specific bucket in S3&lt;/P&gt;&lt;P&gt;### Data frequency&lt;BR /&gt;- Once a day (but not sure when, sometime after 6pm)&lt;/P&gt;&lt;P&gt;### Development environment&lt;BR /&gt;- databricks&lt;BR /&gt;- Delat Live Table from databricks&lt;/P&gt;&lt;P&gt;### Data Status&lt;BR /&gt;- CLOSE_DT, CURR_F_CD, CURR_T_CD are PK, JOIN conditions&lt;BR /&gt;- CLOSE_DT is DATE type&lt;BR /&gt;- Data comes in from source(=RDS) once a day on weekdays.&lt;BR /&gt;- This data is written as a cdc to S3 via AWS DMS&lt;/P&gt;&lt;P&gt;### Processing requirements&lt;BR /&gt;- No data comes into source on non-weekday holidays, but must be matched to the most recent data.&lt;BR /&gt;- Data comes in once a day on weekdays, and the presence or absence of a specific CLOSE_DT can be used to determine if data came in today or not.&lt;BR /&gt;- For example, let's say today is 2023-12-28.&lt;BR /&gt;- You don't know when data with a CLOSE_DT of 2023-12-28 will come in today.&lt;BR /&gt;- So until the data comes in, you create the 2023-12-28 data from the most recent 2023-12-27 data.&lt;BR /&gt;- When the 2023-12-28 data comes in, the data is swapped.&lt;BR /&gt;- No data comes in at all on holidays, so data must be generated with the most recent data each day&lt;/P&gt;</description>
      <pubDate>Thu, 28 Dec 2023 00:54:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/help-design-my-streaming-pipeline/m-p/55865#M30448</guid>
      <dc:creator>rt-slowth</dc:creator>
      <dc:date>2023-12-28T00:54:24Z</dc:date>
    </item>
  </channel>
</rss>

