<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Ingesting data from views in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145482#M52508</link>
    <description>&lt;P&gt;can try&amp;nbsp;&lt;SPAN&gt;using change data feed in combination with Structured Streaming to incrementally process changes from Delta tables.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Hope it helps&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Wed, 28 Jan 2026 03:28:45 GMT</pubDate>
    <dc:creator>yoongkang</dc:creator>
    <dc:date>2026-01-28T03:28:45Z</dc:date>
    <item>
      <title>Ingesting data from views</title>
      <link>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145369#M52487</link>
      <description>&lt;P&gt;Hi all ,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have been looking to create gold tables from views , and also considering to have features of streaming and change data capture .&lt;/P&gt;&lt;P&gt;I know in DLT Workflows this is not possible , so I was wondering is there any other way to do the same please&amp;nbsp;&lt;BR /&gt;The data ingestion from views will be daily and I want to preserve history in my gold table , and add the latest data from views after transformations&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 27 Jan 2026 09:08:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145369#M52487</guid>
      <dc:creator>naman0012</dc:creator>
      <dc:date>2026-01-27T09:08:39Z</dc:date>
    </item>
    <item>
      <title>Re: Ingesting data from views</title>
      <link>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145481#M52507</link>
      <description>&lt;P&gt;If you are already doing Declarative Pipelines you can indeed encapsulate the transformation logic a view inside the pipeline. That view would be reading from a source (lets say) silver streaming table and populating a (lets say) gold MV.&lt;/P&gt;
&lt;P&gt;So raw files (autoloader) --&amp;gt; bronze (append streaming table) --&amp;gt; silver (SCD streaming table) --&amp;gt; view (some additional transformations, joins) --&amp;gt; gold (MV).&lt;/P&gt;
&lt;P&gt;This is already possible.&lt;/P&gt;
&lt;P&gt;Maybe I'm misunderstanding the request. Feel free to clarify.&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 02:29:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145481#M52507</guid>
      <dc:creator>MoJaMa</dc:creator>
      <dc:date>2026-01-28T02:29:12Z</dc:date>
    </item>
    <item>
      <title>Re: Ingesting data from views</title>
      <link>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145482#M52508</link>
      <description>&lt;P&gt;can try&amp;nbsp;&lt;SPAN&gt;using change data feed in combination with Structured Streaming to incrementally process changes from Delta tables.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Hope it helps&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 03:28:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145482#M52508</guid>
      <dc:creator>yoongkang</dc:creator>
      <dc:date>2026-01-28T03:28:45Z</dc:date>
    </item>
    <item>
      <title>Re: Ingesting data from views</title>
      <link>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145490#M52511</link>
      <description>&lt;P&gt;I believe you are looking for two things here -&amp;nbsp;&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;Incrementally refreshing gold tables(materialized views )&amp;nbsp;&lt;/LI&gt;&lt;LI&gt;History on those tables ( materialized views )&amp;nbsp;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;You can get the first feature with materialized views in gold layer if you use SDP (Serverless Pipelines - Powered by Enzyme ) . History is still problematic with MVs . This second requirement would needs a regular delta table . Thats when you will have to switch to using Delta CDF (change data feed ) with structured streaming .&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 05:54:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145490#M52511</guid>
      <dc:creator>pradeep_singh</dc:creator>
      <dc:date>2026-01-28T05:54:26Z</dc:date>
    </item>
    <item>
      <title>Re: Ingesting data from views</title>
      <link>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145492#M52512</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Start from Bronze ingestion (with metadata tables), refine to Silver (with mdd), then aggregate to Gold MLVs for optimized Delta persistence. Schedule full/incremental modes via Fabric's UI, integrating data quality rules for governance. i think Fabric features is good.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 06:01:15 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/145492#M52512</guid>
      <dc:creator>yoongkang</dc:creator>
      <dc:date>2026-01-28T06:01:15Z</dc:date>
    </item>
    <item>
      <title>Re: Ingesting data from views</title>
      <link>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/150173#M53286</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/211996"&gt;@naman0012&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;There are a few different approaches depending on where these "views" live and what your exact architecture looks like. Let me walk through the options.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;CLARIFYING THE SCENARIO&lt;/P&gt;
&lt;P&gt;The key question is: are these database views on an external system (SQL Server, Oracle, Postgres, etc.), or are they views/tables already inside Databricks (Unity Catalog)?&lt;/P&gt;
&lt;P&gt;The approaches differ significantly depending on the answer.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 1: SOURCE VIEWS ARE IN AN EXTERNAL DATABASE&lt;/P&gt;
&lt;P&gt;If you are reading from views in an external database and want to land that data into gold Delta tables with history tracking, Lakeflow Spark Declarative Pipelines (SDP) streaming tables cannot directly stream from external views. Streaming tables require append-only sources like Auto Loader (cloud files), Kafka, or Delta tables with change data feed.&lt;/P&gt;
&lt;P&gt;Instead, you can use a batch ingestion pattern:&lt;/P&gt;
&lt;P&gt;1. Use a scheduled notebook or workflow task to read from the external view via JDBC/query federation and write into a Bronze Delta table.&lt;/P&gt;
&lt;P&gt;2. Enable Change Data Feed on the Bronze table:&lt;/P&gt;
&lt;P&gt;ALTER TABLE catalog.schema.bronze_table&lt;BR /&gt;SET TBLPROPERTIES (delta.enableChangeDataFeed = true);&lt;/P&gt;
&lt;P&gt;3. Then use Structured Streaming with Change Data Feed to process incremental changes downstream:&lt;/P&gt;
&lt;P&gt;from pyspark.sql import functions as F&lt;/P&gt;
&lt;P&gt;changes_df = (&lt;BR /&gt;spark.readStream&lt;BR /&gt;.option("readChangeFeed", "true")&lt;BR /&gt;.table("catalog.schema.bronze_table")&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;# Apply transformations&lt;BR /&gt;gold_df = changes_df.filter(...).withColumn(...)&lt;/P&gt;
&lt;P&gt;(gold_df.writeStream&lt;BR /&gt;.outputMode("append")&lt;BR /&gt;.option("checkpointLocation", "/path/to/checkpoint")&lt;BR /&gt;.toTable("catalog.schema.gold_table")&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;For history preservation specifically, you can use the _change_type, _commit_version, and _commit_timestamp metadata columns from the change data feed to build an audit/history table that tracks all changes over time.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 2: USE SDP WITH AUTO CDC FOR SCD TYPE 2&lt;/P&gt;
&lt;P&gt;If you want automatic history tracking (slowly changing dimension Type 2), Lakeflow Spark Declarative Pipelines (SDP) supports this natively through the AUTO CDC API. You would:&lt;/P&gt;
&lt;P&gt;1. Land data into a Bronze streaming table (via Auto Loader or another supported source).&lt;BR /&gt;2. Use AUTO CDC with SCD Type 2 to automatically maintain history in your gold table.&lt;/P&gt;
&lt;P&gt;In Python (using the current pyspark.pipelines module):&lt;/P&gt;
&lt;P&gt;from pyspark import pipelines as dp&lt;BR /&gt;from pyspark.sql.functions import col&lt;/P&gt;
&lt;P&gt;dp.create_auto_cdc_flow(&lt;BR /&gt;target="gold_customers",&lt;BR /&gt;source="bronze_customers",&lt;BR /&gt;keys=["customer_id"],&lt;BR /&gt;sequence_by=col("updated_at"),&lt;BR /&gt;stored_as_scd_type=2&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;This automatically creates __START_AT and __END_AT columns to track the validity period of each record version. SCD Type 2 preserves full history of changes.&lt;/P&gt;
&lt;P&gt;In SQL:&lt;/P&gt;
&lt;P&gt;CREATE OR REFRESH STREAMING TABLE gold_customers;&lt;/P&gt;
&lt;P&gt;AUTO CDC INTO gold_customers&lt;BR /&gt;FROM bronze_customers&lt;BR /&gt;KEYS (customer_id)&lt;BR /&gt;SEQUENCE BY updated_at&lt;BR /&gt;STORED AS SCD TYPE 2;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;OPTION 3: MATERIALIZED VIEWS FOR SIMPLER AGGREGATION&lt;/P&gt;
&lt;P&gt;If you do not need full row-level history but want a gold table that stays current with transformations applied, materialized views in SDP are a good fit. They incrementally refresh from upstream streaming tables or other Delta tables:&lt;/P&gt;
&lt;P&gt;CREATE MATERIALIZED VIEW gold_summary AS&lt;BR /&gt;SELECT&lt;BR /&gt;customer_id,&lt;BR /&gt;COUNT(*) AS order_count,&lt;BR /&gt;SUM(amount) AS total_amount&lt;BR /&gt;FROM silver_orders&lt;BR /&gt;GROUP BY customer_id;&lt;/P&gt;
&lt;P&gt;Materialized views support automatic incremental refresh when the source tables have row tracking and change data feed enabled. You can schedule refreshes or use TRIGGER ON UPDATE for production freshness.&lt;/P&gt;
&lt;P&gt;Note: materialized views replace their contents on each refresh, so they do not preserve history by default. For history, pair them with a separate history/audit table or use the SCD Type 2 approach above.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;RECOMMENDED ARCHITECTURE FOR YOUR USE CASE&lt;/P&gt;
&lt;P&gt;Based on your requirements (daily ingestion from views, history preservation, transformations in gold), here is the recommended pattern:&lt;/P&gt;
&lt;P&gt;1. BRONZE: Scheduled notebook/task reads from external views daily via JDBC and writes (appends or merges) into a Bronze Delta table with Change Data Feed enabled.&lt;/P&gt;
&lt;P&gt;2. SILVER: Use SDP with a streaming table to clean and validate the data.&lt;/P&gt;
&lt;P&gt;3. GOLD: Use AUTO CDC with SCD Type 2 to maintain a gold table with full history tracking. This gives you the change data capture semantics you are looking for.&lt;/P&gt;
&lt;P&gt;This pattern ensures daily incremental processing, full history in your gold layer, and clean separation of concerns across the medallion architecture.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;USEFUL DOCUMENTATION&lt;/P&gt;
&lt;P&gt;Change Data Feed:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/delta/delta-change-data-feed" target="_blank"&gt;https://docs.databricks.com/aws/en/delta/delta-change-data-feed&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Materialized Views:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/views/materialized" target="_blank"&gt;https://docs.databricks.com/aws/en/views/materialized&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;AUTO CDC in SDP:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/delta-live-tables/cdc" target="_blank"&gt;https://docs.databricks.com/aws/en/delta-live-tables/cdc&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Structured Streaming with Delta:&lt;BR /&gt;&lt;A href="https://docs.databricks.com/aws/en/structured-streaming/delta-lake" target="_blank"&gt;https://docs.databricks.com/aws/en/structured-streaming/delta-lake&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;Hope this helps. Let me know if you can share more about where the source views live and I can refine the recommendation further.&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 07:26:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/ingesting-data-from-views/m-p/150173#M53286</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T07:26:06Z</dc:date>
    </item>
  </channel>
</rss>

