cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Simple append for a DLT

jrod123
New Contributor II

Looking for some help getting unstuck re: appending to DLTs in Databricks. I have successfully extracted data via API endpoint, done some initial data cleaning/processing, and subsequently stored that data in a DLT. Great start. But I noticed that each time the pipeline runs, all of the previous rows are overwritten. The AI assistant and separate google searches have proven worthless thus far to help me understand why I cannot simply append data from each run to the DLT. I manually added a timestamp column to ensure that each run's data is unique.  And each time it runs, I can verify that the data is fresh. I just only see the new data (old is overwritten). According to my research, append is supposedly the default behavior when writing to a DLT, but that's not happening and I don't understand why.  Attempts to explicitly define the append properties for the DLT (both in the notebook and pipeline settings) have not helped.  Here is an simple example of what I'm trying (and failing) to do:

import dlt
from pyspark.sql.functions import current_timestamp

# Function to generate sample data
def generate_data():
    data = [
        (1, "A"),
        (2, "B"),
        (3, "C")
    ]
    df = spark.createDataFrame(data, ["id", "value"])
    df = df.withColumn("timestamp", current_timestamp())
    return df

# Define the Delta Live Table
@dlt.table(
    name="example_table",
    comment="A simple example table",
    table_properties={"pipelines.appendOnly": "true"}
)
def create_example_table():
    return generate_data()

 

4 REPLIES 4

KaranamS
Contributor III

Hi @jrod123 , Can you please try the below method?

1. Create a DLT view to store the api data first. If possible, get only incremental data from the API

@dlt.view 

def api_data_view(): 

       return api_df 

2. Define your DLT table and append the view to your target table

@dlt.table

def target_table():

      df=sparkread.table("api_data_view") #append view data

      return df

This way we are separating the api transformations in a view and then appending the data.

jrod123
New Contributor II

Creating a view first & then a table as you suggested still produces the same result: data in the table is overwritten  (rather than appended) with each run of the pipeline.  Here's a simple code example that I used:

from pyspark.sql import SparkSession
from pyspark.sql.functions import lit
import datetime
import dlt

# Initialize Spark session
spark = SparkSession.builder.appName("Data Ingestion").getOrCreate()

from pyspark.sql.functions import current_timestamp

# Function to generate sample data
def generate_data():
data = [
(1, "A"),
(2, "B"),
(3, "C")
]
df = spark.createDataFrame(data, ["id", "value"])
df = df.withColumn("timestamp", lit(datetime.datetime.now()))
return df

# Define DLT view and table

@Dlt.view(
name="example_view"
)
def create_example_view():
return generate_data()

# # Define the Delta Live Table
@Dlt.table(
name="example_table"
)
def create_example_table():
df = spark.read.table("example_view")
return generate_data()

jrod123
New Contributor II

for reference, here are the json pipeline settings:

{
"id": "96e670ba-....",
"pipeline_type": "WORKSPACE",
"development": true,
"continuous": false,
"channel": "CURRENT",
"photon": true,
"libraries": [
{
"notebook": {
"path": "/Users/.../dummy_dlt"
}
}
],
"name": "dlt_view_to_table",
"serverless": true,
"catalog": "tabular",
"schema": "dataexpert",
"data_sampling": false
}

tastefulSamurai
New Contributor II

I am likewise struggling with this. All DLT configurations that I've tried (including 

spark_conf={"pipelines.autoOptimize.appendOnly": "true"}) just yield overwrites of the existing data.
 
Any luck @jrod123 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now