MadhuB
Valued Contributor

Hi  @WYO 

If it is comparing against on-prem data, I would recommend an efficient approach as below. All you need is to establish a connection to on-prem database using the jdbc configuration as below.

 
Step 1 - Establish connection to the on-prem database
 
# On prem Sql server details

jdbcHostname = "OnPremServerName"
jdbcPort = "portnumber"
jdbcDatabase = "DatabaseName"
dbServerUserName = "serviceaccountuser"
dbServerPassword = dbutils.secrets.get("scope",key="serviceaccountuserpwd")

# Service account credentials string for the on prem sql server
properties = {"user" : dbServerUserName,"password" : dbServerPassword }

# connection string for the on prem sql server
url = "jdbc:sqlserver://{0}:{1};database={2};trustServerCertificate=true;loginTimeout=30;hostNameInCertificate=*.database.windows.net;".format(jdbcHostname,jdbcPort,jdbcDatabase)

# query a table from the on prem sql server
query = "(SELECT * FROM [dbo].[table]) datasource"

# read to a dataframe from the on prem sql server
df_onprem = spark.read.jdbc(url=url, table=query, properties=properties)

#view data
display(df_onprem)

 
Step 2 - Read delta table into the datafame

df_deltatable = spark.sql("select * from catalog.schema.table")

display(df_deltatable)

Step 3 - Compare both the dataframes

# Compare DataFrames for metadata/data mismatches
def compare_dataframes(df1, df2):

#schema comparison
if df1.schema != df2.schema:
print("Schemas are different.")
else:
print("Matching schemas")

#record count check
if df1.count() == df2.count():
print("Record counts match")

#data comparison
if df1.exceptAll(df2).isEmpty() and df2.exceptAll(df1).isEmpty():
print("DataFrames are identical.")
else:
print("DataFrames are not identical.")

# Show differences
print("Rows in df1 but not in df2:")
df1.exceptAll(df2).show()

print("Rows in df2 but not in df1:")
df2.exceptAll(df1).show()

compare_dataframes(df_onprem,df_deltatable)

 

My response is not getting through with formatted code (attachments fyr)

Let me know for anything, else, please mark this as a solution.