Populating data from databricks to sql server tables

Mr__D — Tue, 04 Apr 2023 17:34:05 GMT

Hello All,

Could any one please suggest what is the best way to populate(Upsert) data from delta table into the sql server table.

we are transforming our data in Databricks and storing data into the delta table. but for reporting purpose we need to populate the data in Sql server table as well.

what is the best way to to do the data Upsert from delta table to Sql server ?

does spark synapse connector will be the the best option ?

Thanks,

Deepak

Re: Populating data from databricks to sql server tables

Anonymous — Fri, 07 Apr 2023 01:55:12 GMT

@Deepak Bhatt :

Yes, using the Spark Synapse connector could be a good option for upserting data from a Delta table into a SQL Server table. The Spark Synapse connector allows you to read and write data from Azure Synapse Analytics, formerly known as SQL Data Warehouse, and it supports both read and write operations.

To upsert data, you can first read the data from the Delta table using Spark's DataFrame API, and then write the data to the SQL Server table using the Synapse connector's write method. To perform an upsert, you can use the MERGE statement in SQL Server.

Here's some sample code that demonstrates how to use the Spark Synapse connector to upsert data from a Delta table into a SQL Server table:

# Read data from Delta table
delta_df = spark.read.format("delta").load("/path/to/delta/table")
 
# Write data to SQL Server table
delta_df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("overwrite") \
    .option("url", "jdbc:sqlserver://<your_server>:<your_port>;database=<your_database>") \
    .option("dbtable", "<your_table>") \
    .option("user", "<your_username>") \
    .option("password", "<your_password>") \
    .option("tableLock", "true") \
    .option("bulkCopyBatchSize", "100000") \
    .option("bulkCopyTableLock", "true") \
    .option("bulkCopyTimeout", "600") \
    .option("batchsize", "100000") \
    .option("reliabilityLevel", "BEST_EFFORT") \
    .option("tempDir", "/mnt/temp") \
    .option("checkpointLocation", "/mnt/checkpoint") \
    .option("mergeSchema", "true") \
    .option("createTableColumnTypes", "<column_types>") \
    .option("spark.connection.mode", "databricks") \
    .option("spark.databricks.delta.preview.enabled", "true") \
    .option("synapseStorageAccountUrl", "<your_storage_account_url>") \
    .option("synapseStorageAccountAccessKey", "<your_storage_account_access_key>") \
    .option("synapseLinkedServiceName", "<your_linked_service_name>") \
    .option("synapseWorkspaceName", "<your_workspace_name>") \
    .option("upsert", "true") \
    .option("primaryKey", "<your_primary_key>") \
    .save()

Make sure to replace the placeholders with your own values for the SQL Server connection and table details. Also, set the upsert option to true and specify the primary key column(s) using the primaryKey option to perform an upsert.

Note that you may need to adjust the other options based on your specific requirements and environment.

topic Re: Populating data from databricks to sql server tables in Data Engineering

Populating data from databricks to sql server tables

Re: Populating data from databricks to sql server tables