Databricks

BorislavBlagoev · ‎01-19-2022

def upsertToDelta(microBatchOutputDF, batchId): 
  
  microBatchOutputDF.createOrReplaceTempView("updates")
 
  microBatchOutputDF._jdf.sparkSession().sql("""
    MERGE INTO old o
    USING updates u
    ON u.id = o.id
    WHEN MATCHED THEN UPDATE SET *
    WHEN NOT MATCHED THEN INSERT *
  """)
 
stream_new_df = spark.readStream.format("delta").load(new_data_frame_path)
stream_old_df = spark.readStream.format("delta").load(old_data_frame_path)
 
stream_old_df.createOrReplaceTempView("old")
 
stream_new_df.writeStream.format("delta") \
            .option("checkpointLocation", "") \
            .option("mergeSchema", "true") \
            .option("path", "") \
            .foreachBatch(upsertToDelta) \
            .trigger(once=True) \
            .outputMode("update") \
            .table("")

I'm trying to execute this code but I get the following error:

Data source com.databricks.sql.transaction.tahoe.sources.DeltaDataSource does not support Update output mode

Hubert-Dudek · ‎01-19-2022

Delta table/file version is too old. Please try to upgrade it as described here https://docs.microsoft.com/en-us/azure/databricks/delta/versioning

View solution in original post

Hubert-Dudek · ‎01-19-2022

Delta table/file version is too old. Please try to upgrade it as described here https://docs.microsoft.com/en-us/azure/databricks/delta/versioning

BorislavBlagoev · ‎01-19-2022

Which is the latest version?

BorislavBlagoev · ‎01-19-2022

@Hubert Dudek I get the same error

AnalysisException: Data source com.databricks.sql.transaction.tahoe.sources.DeltaDataSource does not support Update output mode

BorislavBlagoev · ‎01-19-2022

I tried with the both ways

Hubert-Dudek · ‎01-20-2022

Did it work? Databricks runtime is also imported as older one (like one used by data factory)

I think you can also refactor code a bit to use .start() in last line not .table() and change a bit def upsertToDelta to just use something like that (it is in scala but similar logic for python) https://docs.databricks.com/_static/notebooks/merge-in-streaming.html

BorislavBlagoev · ‎01-20-2022

@Hubert Dudek The runtime version is 9.1LTS. And I want to use the `.table()` because I want to have a table in my metastore/catalog

BorislavBlagoev · ‎01-20-2022

@Hubert Dudek I also tried with 10.2 runtime and with toTable() but it's the same

Hubert-Dudek · ‎01-20-2022

to have table in metastore just register your delta location there using seperate sql script (it is enough to do that one time):

%sql
CREATE TABLE IF NOT EXISTS your_db.your_table
( 
 id LONG NOT NULL COMMENT,
 ......
)
USING DELTA
PARTITIONED BY (partition_column)
LOCATION 'path_to_your_delta'

BorislavBlagoev · ‎01-20-2022

@Hubert Dudek It works like that. I have one more question. How can I include and delete that query?

  microBatchOutputDF._jdf.sparkSession().sql("""
    MERGE INTO old o
    USING updates u
    ON u.id= o.id
    WHEN MATCHED THEN UPDATE SET *
    WHEN NOT MATCHED THEN INSERT *
  """)

Or how I can add and delete rows from this pipeline.

Databricks

Tring to create incremental pipeline but fails when I try to use outputMode "update"

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs