Databricks

User16826994223 · ‎06-25-2021

Is there a way to keep my synapse database always in sync with latest data from delta table, My synapse database I believe doesn't support the stream as sink, can i get any workaround

User16826994223 · ‎06-25-2021

You could try to keep the data in sync by appending the new data dataframe in a forEachBatch on your write stream, this method allows for arbitrary ways to write data, you can connect to the Datawarehouse with jdbc if necessary:with your batch function being something like:

df = spark.readStream\
          .format('delta')\
          .load(input_path)
 
df_write = df.writeStream \
            .format("delta") \
            .foreachBatch(batch_write_jdbc) \
            .option("checkpointLocation", chekpoint) \
            .start("noop")\

Noop is dummy operation of write which will not actually write but starte the stream process which call the batch function that writes using jdbc

with your batch function being something like:

def batch_write_jdbc (df, batchId):
  
    df = df.anytransformation
    df.write.jdbc(jdbc_url, table=schema_name + "." + table_name, mode="append", properties=connection_properties)

Databricks

Delta Table to Spark Streaming to Synapse Table in azure databricks

Unity Catalog Lakeguard: Industry-first and only data governance for multi-user Apache™ Spark cluste

Announcing the General Availability of Databricks Asset Bundles

Register now and save 50% on training at Data + AI Summit!

How to successfully build GenAI applications

Meet DBRX, the New Standard for High-Quality LLMs