<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Export data from databricks to prem in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106873#M42621</link>
    <description>&lt;P&gt;Yes, you can export data from Databricks to an Excel sheet or a .csv file. Use the method to convert your Spark DataFrame to a Pandas DataFrame, then save it using &amp;nbsp;for a .csv file or a library like &amp;nbsp;for Excel. Once exported, you can compare the file with your on-premise SQL Server data to verify alignment.&lt;/P&gt;</description>
    <pubDate>Fri, 24 Jan 2025 05:59:02 GMT</pubDate>
    <dc:creator>nick533</dc:creator>
    <dc:date>2025-01-24T05:59:02Z</dc:date>
    <item>
      <title>Export data from databricks to prem</title>
      <link>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106813#M42599</link>
      <description>&lt;P&gt;Hello everyone&lt;/P&gt;&lt;P&gt;I need to export some data to sql server management studio on premise.&lt;/P&gt;&lt;P&gt;I need to verify that the new data on databricks is aligned with the older data that we have on premise.&lt;/P&gt;&lt;P&gt;Is it possible to export data as an Excel sheet or .csv file?&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;WYO&lt;/P&gt;</description>
      <pubDate>Thu, 23 Jan 2025 14:05:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106813#M42599</guid>
      <dc:creator>WYO</dc:creator>
      <dc:date>2025-01-23T14:05:25Z</dc:date>
    </item>
    <item>
      <title>Re: Export data from databricks to prem</title>
      <link>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106846#M42612</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/145602"&gt;@WYO&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;If it is comparing against on-prem data, I would recommend an efficient approach as below. All you need is to &lt;STRONG&gt;establish a connection to on-prem database using the jdbc configuration as below.&lt;/STRONG&gt;&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Step 1 - Establish connection to the on-prem database&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;# On prem Sql server details&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;jdbcHostname = "OnPremServerName"&lt;BR /&gt;jdbcPort = "portnumber"&lt;BR /&gt;jdbcDatabase = "DatabaseName"&lt;BR /&gt;dbServerUserName = "serviceaccountuser"&lt;BR /&gt;dbServerPassword = dbutils.secrets.get("scope",key="serviceaccountuserpwd")&lt;/P&gt;&lt;P&gt;# Service account credentials string for the on prem sql server&lt;BR /&gt;properties = {"user" : dbServerUserName,"password" : dbServerPassword }&lt;/P&gt;&lt;P&gt;# connection string for the on prem sql server&lt;BR /&gt;url = "jdbc:sqlserver://{0}:{1};database={2};trustServerCertificate=true;loginTimeout=30;hostNameInCertificate=*.database.windows.net;".format(jdbcHostname,jdbcPort,jdbcDatabase)&lt;/P&gt;&lt;P&gt;# query a table from the on prem sql server&lt;BR /&gt;query = "(SELECT * FROM [dbo].[table]) datasource"&lt;/P&gt;&lt;P&gt;# read to a dataframe from the on prem sql server&lt;BR /&gt;df_onprem = spark.read.jdbc(url=url, table=query, properties=properties)&lt;/P&gt;&lt;P&gt;#view data&lt;BR /&gt;display(df_onprem)&lt;/P&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Step 2 - Read delta table into the datafame&lt;/SPAN&gt;&lt;/DIV&gt;&lt;P&gt;df_deltatable = spark.sql("select * from catalog.schema.table")&lt;/P&gt;&lt;P&gt;display(df_deltatable)&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Step 3 - Compare both the dataframes&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;# Compare DataFrames for metadata/data mismatches&lt;BR /&gt;def compare_dataframes(df1, df2):&lt;BR /&gt;&lt;BR /&gt;#schema comparison&lt;BR /&gt;if df1.schema != df2.schema:&lt;BR /&gt;print("Schemas are different.")&lt;BR /&gt;else:&lt;BR /&gt;print("Matching schemas")&lt;/P&gt;&lt;P&gt;#record count check&lt;BR /&gt;if df1.count() == df2.count():&lt;BR /&gt;print("Record counts match")&lt;/P&gt;&lt;P&gt;#data comparison&lt;BR /&gt;if df1.exceptAll(df2).isEmpty() and df2.exceptAll(df1).isEmpty():&lt;BR /&gt;print("DataFrames are identical.")&lt;BR /&gt;else:&lt;BR /&gt;print("DataFrames are not identical.")&lt;/P&gt;&lt;P&gt;# Show differences&lt;BR /&gt;print("Rows in df1 but not in df2:")&lt;BR /&gt;df1.exceptAll(df2).show()&lt;/P&gt;&lt;P&gt;print("Rows in df2 but not in df1:")&lt;BR /&gt;df2.exceptAll(df1).show()&lt;/P&gt;&lt;P&gt;compare_dataframes(df_onprem,df_deltatable)&lt;/P&gt;&lt;DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My response is not getting through with formatted code (attachments fyr)&lt;/P&gt;&lt;P&gt;Let me know for anything, else, &lt;STRONG&gt;please mark this as a solution.&lt;/STRONG&gt;&lt;/P&gt;&lt;/DIV&gt;</description>
      <pubDate>Thu, 23 Jan 2025 19:10:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106846#M42612</guid>
      <dc:creator>MadhuB</dc:creator>
      <dc:date>2025-01-23T19:10:56Z</dc:date>
    </item>
    <item>
      <title>Re: Export data from databricks to prem</title>
      <link>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106873#M42621</link>
      <description>&lt;P&gt;Yes, you can export data from Databricks to an Excel sheet or a .csv file. Use the method to convert your Spark DataFrame to a Pandas DataFrame, then save it using &amp;nbsp;for a .csv file or a library like &amp;nbsp;for Excel. Once exported, you can compare the file with your on-premise SQL Server data to verify alignment.&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2025 05:59:02 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106873#M42621</guid>
      <dc:creator>nick533</dc:creator>
      <dc:date>2025-01-24T05:59:02Z</dc:date>
    </item>
    <item>
      <title>Re: Export data from databricks to prem</title>
      <link>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106885#M42624</link>
      <description>&lt;P&gt;You can compare your databricks data with on-prem sql server data in two ways:&lt;/P&gt;&lt;P&gt;Firstly, you have to make connection between sql-server and databricks using volumes.&amp;nbsp; Using volumes we can mount sql server data into databricks unity catalog&lt;/P&gt;&lt;P&gt;1.we can compare the data we have in databricks and sql-server data(volume data) in databricks itself.&lt;/P&gt;&lt;P&gt;2. writing databricks data into volume(in sql-server) and then comparing the data in sql-server.&lt;/P&gt;&lt;P&gt;Either way make the best practice to use volumes instead of jdbc/odbc or dbfs mount(deprecated) and manually exporting tables, which are not recommended.&lt;/P&gt;&lt;P&gt;If you found this helpful, mark it as solution&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Avinash N&lt;/P&gt;</description>
      <pubDate>Fri, 24 Jan 2025 09:22:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/export-data-from-databricks-to-prem/m-p/106885#M42624</guid>
      <dc:creator>Avinash_Narala</dc:creator>
      <dc:date>2025-01-24T09:22:00Z</dc:date>
    </item>
  </channel>
</rss>

