Databricks Community

User16826994223 · ‎06-25-2021

Is there a way to keep my synapse database always in sync with latest data from delta table, My synapse database I believe doesn't support the stream as sink, can i get any workaround

User16826994223 · ‎06-25-2021

You could try to keep the data in sync by appending the new data dataframe in a forEachBatch on your write stream, this method allows for arbitrary ways to write data, you can connect to the Datawarehouse with jdbc if necessary:with your batch function being something like:

df = spark.readStream\
          .format('delta')\
          .load(input_path)
 
df_write = df.writeStream \
            .format("delta") \
            .foreachBatch(batch_write_jdbc) \
            .option("checkpointLocation", chekpoint) \
            .start("noop")\

Noop is dummy operation of write which will not actually write but starte the stream process which call the batch function that writes using jdbc

with your batch function being something like:

def batch_write_jdbc (df, batchId):
  
    df = df.anytransformation
    df.write.jdbc(jdbc_url, table=schema_name + "." + table_name, mode="append", properties=connection_properties)

Databricks Community

Delta Table to Spark Streaming to Synapse Table in azure databricks

Connect with Databricks Users in Your Area

Join Us as a Community Technical Moderator

Databricks Community Champion - October 2024 - Filip Niziol

Become Our Next Monthly Community Champion!

Introducing Simple, Fast, and Scalable Batch LLM Inference on Mosaic AI Model Serving

Databricks Migration Strategy: Lessons Learned