cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Export data from databricks to prem

WYO
New Contributor II

Hello everyone

I need to export some data to sql server management studio on premise.

I need to verify that the new data on databricks is aligned with the older data that we have on premise.

Is it possible to export data as an Excel sheet or .csv file?

Regards,

WYO

3 REPLIES 3

MadhuB
Contributor II

Hi  @WYO 

If it is comparing against on-prem data, I would recommend an efficient approach as below. All you need is to establish a connection to on-prem database using the jdbc configuration as below.

 
Step 1 - Establish connection to the on-prem database
 
# On prem Sql server details

jdbcHostname = "OnPremServerName"
jdbcPort = "portnumber"
jdbcDatabase = "DatabaseName"
dbServerUserName = "serviceaccountuser"
dbServerPassword = dbutils.secrets.get("scope",key="serviceaccountuserpwd")

# Service account credentials string for the on prem sql server
properties = {"user" : dbServerUserName,"password" : dbServerPassword }

# connection string for the on prem sql server
url = "jdbc:sqlserver://{0}:{1};database={2};trustServerCertificate=true;loginTimeout=30;hostNameInCertificate=*.database.windows.net;".format(jdbcHostname,jdbcPort,jdbcDatabase)

# query a table from the on prem sql server
query = "(SELECT * FROM [dbo].[table]) datasource"

# read to a dataframe from the on prem sql server
df_onprem = spark.read.jdbc(url=url, table=query, properties=properties)

#view data
display(df_onprem)

 
Step 2 - Read delta table into the datafame

df_deltatable = spark.sql("select * from catalog.schema.table")

display(df_deltatable)

Step 3 - Compare both the dataframes

# Compare DataFrames for metadata/data mismatches
def compare_dataframes(df1, df2):

#schema comparison
if df1.schema != df2.schema:
print("Schemas are different.")
else:
print("Matching schemas")

#record count check
if df1.count() == df2.count():
print("Record counts match")

#data comparison
if df1.exceptAll(df2).isEmpty() and df2.exceptAll(df1).isEmpty():
print("DataFrames are identical.")
else:
print("DataFrames are not identical.")

# Show differences
print("Rows in df1 but not in df2:")
df1.exceptAll(df2).show()

print("Rows in df2 but not in df1:")
df2.exceptAll(df1).show()

compare_dataframes(df_onprem,df_deltatable)

 

My response is not getting through with formatted code (attachments fyr)

Let me know for anything, else, please mark this as a solution.

nick533
New Contributor II

Yes, you can export data from Databricks to an Excel sheet or a .csv file. Use the method to convert your Spark DataFrame to a Pandas DataFrame, then save it using  for a .csv file or a library like  for Excel. Once exported, you can compare the file with your on-premise SQL Server data to verify alignment.

Avinash_Narala
Valued Contributor II

You can compare your databricks data with on-prem sql server data in two ways:

Firstly, you have to make connection between sql-server and databricks using volumes.  Using volumes we can mount sql server data into databricks unity catalog

1.we can compare the data we have in databricks and sql-server data(volume data) in databricks itself.

2. writing databricks data into volume(in sql-server) and then comparing the data in sql-server.

Either way make the best practice to use volumes instead of jdbc/odbc or dbfs mount(deprecated) and manually exporting tables, which are not recommended.

If you found this helpful, mark it as solution

Regards,

Avinash N

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group