Databricks

Mr__D · ‎04-04-2023

Hello All,

Could any one please suggest what is the best way to populate(Upsert) data from delta table into the sql server table.

we are transforming our data in Databricks and storing data into the delta table. but for reporting purpose we need to populate the data in Sql server table as well.

what is the best way to to do the data Upsert from delta table to Sql server ?

does spark synapse connector will be the the best option ?

Thanks,

Deepak

Anonymous · ‎04-06-2023

@Deepak Bhatt :

Yes, using the Spark Synapse connector could be a good option for upserting data from a Delta table into a SQL Server table. The Spark Synapse connector allows you to read and write data from Azure Synapse Analytics, formerly known as SQL Data Warehouse, and it supports both read and write operations.

To upsert data, you can first read the data from the Delta table using Spark's DataFrame API, and then write the data to the SQL Server table using the Synapse connector's write method. To perform an upsert, you can use the MERGE statement in SQL Server.

Here's some sample code that demonstrates how to use the Spark Synapse connector to upsert data from a Delta table into a SQL Server table:

# Read data from Delta table
delta_df = spark.read.format("delta").load("/path/to/delta/table")
 
# Write data to SQL Server table
delta_df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", "jdbc:sqlserver://<your_server>:<your_port>;database=<your_database>") \
    .option("dbtable", "<your_table>") \
    .option("user", "<your_username>") \
    .option("password", "<your_password>") \
    .option("tableLock", "true") \
    .option("bulkCopyBatchSize", "100000") \
    .option("bulkCopyTableLock", "true") \
    .option("bulkCopyTimeout", "600") \
    .option("batchsize", "100000") \
    .option("reliabilityLevel", "BEST_EFFORT") \
    .option("tempDir", "/mnt/temp") \
    .option("checkpointLocation", "/mnt/checkpoint") \
    .option("mergeSchema", "true") \
    .option("createTableColumnTypes", "<column_types>") \
    .option("spark.connection.mode", "databricks") \
    .option("spark.databricks.delta.preview.enabled", "true") \
    .option("synapseStorageAccountUrl", "<your_storage_account_url>") \
    .option("synapseStorageAccountAccessKey", "<your_storage_account_access_key>") \
    .option("synapseLinkedServiceName", "<your_linked_service_name>") \
    .option("synapseWorkspaceName", "<your_workspace_name>") \
    .option("upsert", "true") \
    .option("primaryKey", "<your_primary_key>") \
    .save()

Make sure to replace the placeholders with your own values for the SQL Server connection and table details. Also, set the upsert option to true and specify the primary key column(s) using the primaryKey option to perform an upsert.

Note that you may need to adjust the other options based on your specific requirements and environment.

Databricks

Populating data from databricks to sql server tables

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI