Databricks Community

guangyi · ‎07-07-2024

According to the document https://docs.databricks.com/en/structured-streaming/delta-lake.html#complete-mode, the “complete” option seems to “replace the entire table with every batch”. However, it is not working in my case.

Here is how I reproduce the issue:

Firstly I prepared a single file in the ADLS named `employee_01.csv`. Then I use the python code to read data from it and generate a table

outputMode = 'complete'

default_spark_options = {
    "cloudFiles.format": "csv",
    "delimiter": "\x01",
    "inferSchema": "true"
}

    @Dlt.table(
        name = table_01,
    )
    def create_raw_table():
        path = source_path
        df = (spark.readStream
            .outputMode(outputMode)
            .format("cloudFiles")
            .options(**spark_options)
            .load(path))
        return df