cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Populating data from databricks to sql server tables

Mr__D
New Contributor II

Hello All,

Could any one please suggest what is the best way to populate(Upsert) data from delta table into the sql server table.

we are transforming our data in Databricks and storing data into the delta table. but for reporting purpose we need to populate the data in Sql server table as well.

what is the best way to to do the data Upsert from delta table to Sql server ?

does spark synapse connector will be the the best option ?

Thanks,

Deepak

1 REPLY 1

Anonymous
Not applicable

@Deepak Bhatt​ :

Yes, using the Spark Synapse connector could be a good option for upserting data from a Delta table into a SQL Server table. The Spark Synapse connector allows you to read and write data from Azure Synapse Analytics, formerly known as SQL Data Warehouse, and it supports both read and write operations.

To upsert data, you can first read the data from the Delta table using Spark's DataFrame API, and then write the data to the SQL Server table using the Synapse connector's write method. To perform an upsert, you can use the MERGE statement in SQL Server.

Here's some sample code that demonstrates how to use the Spark Synapse connector to upsert data from a Delta table into a SQL Server table:

# Read data from Delta table
delta_df = spark.read.format("delta").load("/path/to/delta/table")
 
# Write data to SQL Server table
delta_df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", "jdbc:sqlserver://<your_server>:<your_port>;database=<your_database>") \
    .option("dbtable", "<your_table>") \
    .option("user", "<your_username>") \
    .option("password", "<your_password>") \
    .option("tableLock", "true") \
    .option("bulkCopyBatchSize", "100000") \
    .option("bulkCopyTableLock", "true") \
    .option("bulkCopyTimeout", "600") \
    .option("batchsize", "100000") \
    .option("reliabilityLevel", "BEST_EFFORT") \
    .option("tempDir", "/mnt/temp") \
    .option("checkpointLocation", "/mnt/checkpoint") \
    .option("mergeSchema", "true") \
    .option("createTableColumnTypes", "<column_types>") \
    .option("spark.connection.mode", "databricks") \
    .option("spark.databricks.delta.preview.enabled", "true") \
    .option("synapseStorageAccountUrl", "<your_storage_account_url>") \
    .option("synapseStorageAccountAccessKey", "<your_storage_account_access_key>") \
    .option("synapseLinkedServiceName", "<your_linked_service_name>") \
    .option("synapseWorkspaceName", "<your_workspace_name>") \
    .option("upsert", "true") \
    .option("primaryKey", "<your_primary_key>") \
    .save()

Make sure to replace the placeholders with your own values for the SQL Server connection and table details. Also, set the upsert option to true and specify the primary key column(s) using the primaryKey option to perform an upsert.

Note that you may need to adjust the other options based on your specific requirements and environment.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.