OutputMode “complete” unable to replace the entire table

guangyi — Mon, 08 Jul 2024 05:02:16 GMT

According to the document https://docs.databricks.com/en/structured-streaming/delta-lake.html#complete-mode, the “complete” option seems to “replace the entire table with every batch”. However, it is not working in my case.

Here is how I reproduce the issue:

Firstly I prepared a single file in the ADLS named `employee_01.csv`. Then I use the python code to read data from it and generate a table

outputMode = 'complete' default_spark_options = { "cloudFiles.format": "csv", "delimiter": "\x01", "inferSchema": "true" } @Dlt.table( name = table_01, ) def create_raw_table(): path = source_path df = (spark.readStream .outputMode(outputMode) .format("cloudFiles") .options(**spark_options) .load(path)) return df

I can load the data and create the table successfully

Then I upload another file in the ADLS and trigger the DLT pipeline again.

However, when the DLT pipeline finished running. The table result seems contains the two running result together

Do I understanding the `complete` outputMode incorrectly

Re: OutputMode “complete” unable to replace the entire table

guangyi — Mon, 08 Jul 2024 06:54:49 GMT

I figure out it already. I cannot find the delete button. Please ignore this post

topic OutputMode “complete” unable to replace the entire table in Data Engineering

OutputMode “complete” unable to replace the entire table

Re: OutputMode “complete” unable to replace the entire table