topic Re: Export data from databricks to prem in Data Engineering

Export data from databricks to prem

WYO — Thu, 23 Jan 2025 14:05:25 GMT

Hello everyone

I need to export some data to sql server management studio on premise.

I need to verify that the new data on databricks is aligned with the older data that we have on premise.

Is it possible to export data as an Excel sheet or .csv file?

Regards,

WYO

Re: Export data from databricks to prem

MadhuB — Thu, 23 Jan 2025 19:10:56 GMT

Hi @WYO

If it is comparing against on-prem data, I would recommend an efficient approach as below. All you need is to establish a connection to on-prem database using the jdbc configuration as below.

Step 1 - Establish connection to the on-prem database

# On prem Sql server details

jdbcHostname = "OnPremServerName"
jdbcPort = "portnumber"
jdbcDatabase = "DatabaseName"
dbServerUserName = "serviceaccountuser"
dbServerPassword = dbutils.secrets.get("scope",key="serviceaccountuserpwd")

# Service account credentials string for the on prem sql server
properties = {"user" : dbServerUserName,"password" : dbServerPassword }

# connection string for the on prem sql server
url = "jdbc:sqlserver://{0}:{1};database={2};trustServerCertificate=true;loginTimeout=30;hostNameInCertificate=*.database.windows.net;".format(jdbcHostname,jdbcPort,jdbcDatabase)

# query a table from the on prem sql server
query = "(SELECT * FROM [dbo].[table]) datasource"

# read to a dataframe from the on prem sql server
df_onprem = spark.read.jdbc(url=url, table=query, properties=properties)

#view data
display(df_onprem)

Step 2 - Read delta table into the datafame

df_deltatable = spark.sql("select * from catalog.schema.table")

display(df_deltatable)

Step 3 - Compare both the dataframes

# Compare DataFrames for metadata/data mismatches
def compare_dataframes(df1, df2):

#schema comparison
if df1.schema != df2.schema:
print("Schemas are different.")
else:
print("Matching schemas")

#record count check
if df1.count() == df2.count():
print("Record counts match")

#data comparison
if df1.exceptAll(df2).isEmpty() and df2.exceptAll(df1).isEmpty():
print("DataFrames are identical.")
else:
print("DataFrames are not identical.")

# Show differences
print("Rows in df1 but not in df2:")
df1.exceptAll(df2).show()

print("Rows in df2 but not in df1:")
df2.exceptAll(df1).show()

compare_dataframes(df_onprem,df_deltatable)

My response is not getting through with formatted code (attachments fyr)

Let me know for anything, else, please mark this as a solution.

Re: Export data from databricks to prem

nick533 — Fri, 24 Jan 2025 05:59:02 GMT

Yes, you can export data from Databricks to an Excel sheet or a .csv file. Use the method to convert your Spark DataFrame to a Pandas DataFrame, then save it using for a .csv file or a library like for Excel. Once exported, you can compare the file with your on-premise SQL Server data to verify alignment.

Re: Export data from databricks to prem

Avinash_Narala — Fri, 24 Jan 2025 09:22:00 GMT

You can compare your databricks data with on-prem sql server data in two ways:

Firstly, you have to make connection between sql-server and databricks using volumes. Using volumes we can mount sql server data into databricks unity catalog

1.we can compare the data we have in databricks and sql-server data(volume data) in databricks itself.

2. writing databricks data into volume(in sql-server) and then comparing the data in sql-server.

Either way make the best practice to use volumes instead of jdbc/odbc or dbfs mount(deprecated) and manually exporting tables, which are not recommended.

If you found this helpful, mark it as solution

Regards,

Avinash N