<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Simple append for a DLT in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111350#M43858</link>
    <description>&lt;P&gt;for reference, here are the json pipeline settings:&lt;BR /&gt;&lt;BR /&gt;{&lt;BR /&gt;"id": "96e670ba-....",&lt;BR /&gt;"pipeline_type": "WORKSPACE",&lt;BR /&gt;"development": true,&lt;BR /&gt;"continuous": false,&lt;BR /&gt;"channel": "CURRENT",&lt;BR /&gt;"photon": true,&lt;BR /&gt;"libraries": [&lt;BR /&gt;{&lt;BR /&gt;"notebook": {&lt;BR /&gt;"path": "/Users/.../dummy_dlt"&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;],&lt;BR /&gt;"name": "dlt_view_to_table",&lt;BR /&gt;"serverless": true,&lt;BR /&gt;"catalog": "tabular",&lt;BR /&gt;"schema": "dataexpert",&lt;BR /&gt;"data_sampling": false&lt;BR /&gt;}&lt;/P&gt;</description>
    <pubDate>Thu, 27 Feb 2025 05:38:43 GMT</pubDate>
    <dc:creator>jrod123</dc:creator>
    <dc:date>2025-02-27T05:38:43Z</dc:date>
    <item>
      <title>Simple append for a DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111169#M43817</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Looking for some help getting unstuck re&lt;/SPAN&gt;&lt;SPAN&gt;: appending to DLTs in Databricks&lt;/SPAN&gt;&lt;SPAN&gt;. I have successfully extracted data via API endpoint&lt;/SPAN&gt;&lt;SPAN&gt;, done some initial data cleaning/processing&lt;/SPAN&gt;&lt;SPAN&gt;, and subsequently stored that data in a DLT&lt;/SPAN&gt;&lt;SPAN&gt;. Great start&lt;/SPAN&gt;&lt;SPAN&gt;. But I noticed that each time the pipeline runs&lt;/SPAN&gt;&lt;SPAN&gt;, all of the previous rows are overwritten&lt;/SPAN&gt;&lt;SPAN&gt;. The AI assistant and separate google searches have proven worthless thus far to help me understand why I cannot simply append data from each run to the DLT&lt;/SPAN&gt;&lt;SPAN&gt;. I manually added a timestamp column to ensure that each run&lt;/SPAN&gt;&lt;SPAN&gt;'s data is unique.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;SPAN&gt;And each time it runs&lt;/SPAN&gt;&lt;SPAN&gt;, I can verify that the data is fresh&lt;/SPAN&gt;&lt;SPAN&gt;. I just only see the new data &lt;/SPAN&gt;&lt;SPAN&gt;(old is overwritten&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;. According to my research&lt;/SPAN&gt;&lt;SPAN&gt;, append is supposedly the default behavior when writing to a DLT&lt;/SPAN&gt;&lt;SPAN&gt;, but that&lt;/SPAN&gt;&lt;SPAN&gt;'s not happening and I don&lt;/SPAN&gt;&lt;SPAN&gt;'t understand why&lt;/SPAN&gt;&lt;SPAN&gt;.&amp;nbsp; Attempts to explicitly define the append properties for the DLT &lt;/SPAN&gt;&lt;SPAN&gt;(both in the notebook and pipeline settings&lt;/SPAN&gt;&lt;SPAN&gt;) have not helped&lt;/SPAN&gt;&lt;SPAN&gt;.&amp;nbsp; Here is an simple example of what I'm trying (and failing) to do:&lt;BR /&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; dlt&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;from&lt;/SPAN&gt;&lt;SPAN&gt; pyspark.sql.functions &lt;/SPAN&gt;&lt;SPAN&gt;import&lt;/SPAN&gt;&lt;SPAN&gt; current_timestamp&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Function to generate sample data&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;generate_data&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; data &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; [&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (&lt;/SPAN&gt;&lt;SPAN&gt;1&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"A"&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (&lt;/SPAN&gt;&lt;SPAN&gt;2&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"B"&lt;/SPAN&gt;&lt;SPAN&gt;),&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; (&lt;/SPAN&gt;&lt;SPAN&gt;3&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"C"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; ]&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; spark.&lt;/SPAN&gt;&lt;SPAN&gt;createDataFrame&lt;/SPAN&gt;&lt;SPAN&gt;(data, [&lt;/SPAN&gt;&lt;SPAN&gt;"id"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"value"&lt;/SPAN&gt;&lt;SPAN&gt;])&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; df &lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt; df.&lt;/SPAN&gt;&lt;SPAN&gt;withColumn&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"timestamp"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;current_timestamp&lt;/SPAN&gt;&lt;SPAN&gt;())&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt;&lt;SPAN&gt; df&lt;/SPAN&gt;&lt;/DIV&gt;&lt;BR /&gt;&lt;DIV&gt;&lt;SPAN&gt;# Define the Delta Live Table&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;@&lt;/SPAN&gt;&lt;SPAN&gt;dlt&lt;/SPAN&gt;&lt;SPAN&gt;.&lt;/SPAN&gt;&lt;SPAN&gt;table&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;name&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"example_table"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;comment&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;"A simple example table"&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;table_properties&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;"pipelines.appendOnly"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;}&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;def&lt;/SPAN&gt; &lt;SPAN&gt;create_example_table&lt;/SPAN&gt;&lt;SPAN&gt;():&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &lt;/SPAN&gt;&lt;SPAN&gt;return&lt;/SPAN&gt; &lt;SPAN&gt;generate_data&lt;/SPAN&gt;&lt;SPAN&gt;()&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 25 Feb 2025 22:34:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111169#M43817</guid>
      <dc:creator>jrod123</dc:creator>
      <dc:date>2025-02-25T22:34:44Z</dc:date>
    </item>
    <item>
      <title>Re: Simple append for a DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111306#M43850</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/150820"&gt;@jrod123&lt;/a&gt;&amp;nbsp;, Can you please try the below method?&lt;/P&gt;&lt;P&gt;1. Create a DLT view to store the api data first. If possible, get only incremental data from the API&lt;/P&gt;&lt;P&gt;@dlt.view&amp;nbsp;&lt;/P&gt;&lt;P&gt;def api_data_view():&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;return api_df&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;2. Define your DLT table and append the view to your target table&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;P&gt;@dlt.table&lt;/P&gt;&lt;P&gt;def target_table():&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; df=sparkread.table("api_data_view") #append view data&lt;/P&gt;&lt;P&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return df&lt;/P&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;This way we are separating the api transformations in a view and then appending the data.&lt;/P&gt;</description>
      <pubDate>Wed, 26 Feb 2025 23:43:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111306#M43850</guid>
      <dc:creator>KaranamS</dc:creator>
      <dc:date>2025-02-26T23:43:10Z</dc:date>
    </item>
    <item>
      <title>Re: Simple append for a DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111349#M43857</link>
      <description>&lt;P&gt;Creating a view first &amp;amp; then a table as you suggested still produces the same result: data in the table is overwritten&amp;nbsp; (rather than appended) with each run of the pipeline.&amp;nbsp; Here's a simple code example that I used:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;BR /&gt;from pyspark.sql.functions import lit&lt;BR /&gt;import datetime&lt;BR /&gt;import dlt&lt;/P&gt;&lt;P&gt;# Initialize Spark session&lt;BR /&gt;spark = SparkSession.builder.appName("Data Ingestion").getOrCreate()&lt;/P&gt;&lt;P&gt;from pyspark.sql.functions import current_timestamp&lt;/P&gt;&lt;P&gt;# Function to generate sample data&lt;BR /&gt;def generate_data():&lt;BR /&gt;data = [&lt;BR /&gt;(1, "A"),&lt;BR /&gt;(2, "B"),&lt;BR /&gt;(3, "C")&lt;BR /&gt;]&lt;BR /&gt;df = spark.createDataFrame(data, ["id", "value"])&lt;BR /&gt;df = df.withColumn("timestamp", lit(datetime.datetime.now()))&lt;BR /&gt;return df&lt;/P&gt;&lt;P&gt;# Define DLT view and table&lt;/P&gt;&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.view(&lt;BR /&gt;name="example_view"&lt;BR /&gt;)&lt;BR /&gt;def create_example_view():&lt;BR /&gt;return generate_data()&lt;/P&gt;&lt;P&gt;# # Define the Delta Live Table&lt;BR /&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table(&lt;BR /&gt;name="example_table"&lt;BR /&gt;)&lt;BR /&gt;def create_example_table():&lt;BR /&gt;df = spark.read.table("example_view")&lt;BR /&gt;return generate_data()&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2025 05:33:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111349#M43857</guid>
      <dc:creator>jrod123</dc:creator>
      <dc:date>2025-02-27T05:33:17Z</dc:date>
    </item>
    <item>
      <title>Re: Simple append for a DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111350#M43858</link>
      <description>&lt;P&gt;for reference, here are the json pipeline settings:&lt;BR /&gt;&lt;BR /&gt;{&lt;BR /&gt;"id": "96e670ba-....",&lt;BR /&gt;"pipeline_type": "WORKSPACE",&lt;BR /&gt;"development": true,&lt;BR /&gt;"continuous": false,&lt;BR /&gt;"channel": "CURRENT",&lt;BR /&gt;"photon": true,&lt;BR /&gt;"libraries": [&lt;BR /&gt;{&lt;BR /&gt;"notebook": {&lt;BR /&gt;"path": "/Users/.../dummy_dlt"&lt;BR /&gt;}&lt;BR /&gt;}&lt;BR /&gt;],&lt;BR /&gt;"name": "dlt_view_to_table",&lt;BR /&gt;"serverless": true,&lt;BR /&gt;"catalog": "tabular",&lt;BR /&gt;"schema": "dataexpert",&lt;BR /&gt;"data_sampling": false&lt;BR /&gt;}&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2025 05:38:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/111350#M43858</guid>
      <dc:creator>jrod123</dc:creator>
      <dc:date>2025-02-27T05:38:43Z</dc:date>
    </item>
    <item>
      <title>Re: Simple append for a DLT</title>
      <link>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/112639#M44278</link>
      <description>&lt;P&gt;I am likewise struggling with this. All DLT configurations that I've tried (including&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;spark_conf&lt;/SPAN&gt;&lt;SPAN&gt;=&lt;/SPAN&gt;&lt;SPAN&gt;{&lt;/SPAN&gt;&lt;SPAN&gt;"pipelines.autoOptimize.appendOnly"&lt;/SPAN&gt;&lt;SPAN&gt;: &lt;/SPAN&gt;&lt;SPAN&gt;"true"&lt;/SPAN&gt;&lt;SPAN&gt;}) just yield overwrites of the existing data.&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Any luck&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/150820"&gt;@jrod123&lt;/a&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Fri, 14 Mar 2025 22:51:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/simple-append-for-a-dlt/m-p/112639#M44278</guid>
      <dc:creator>tastefulSamurai</dc:creator>
      <dc:date>2025-03-14T22:51:55Z</dc:date>
    </item>
  </channel>
</rss>

