<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Anti pattern : moving data from cloud to on-prem in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33032#M24122</link>
    <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;In my current project, &lt;/P&gt;&lt;P&gt;Current status: Az databricks streaming jobs migrate Json file from kafka to raw layer(parquet file), then parsing logic is applied and 8 tables are created in raw standardized layer.&lt;/P&gt;&lt;P&gt;Requirement: Business team wants to access this data from on prem sql server and hence propose to follow an anti pattern(moving data from cloud to on-prem) as they cannot access databricks due to technical inability.&lt;/P&gt;&lt;P&gt;How can this be achieved? Using ADF?&lt;/P&gt;</description>
    <pubDate>Tue, 30 Aug 2022 20:40:47 GMT</pubDate>
    <dc:creator>Ruby8376</dc:creator>
    <dc:date>2022-08-30T20:40:47Z</dc:date>
    <item>
      <title>Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33032#M24122</link>
      <description>&lt;P&gt;Hi there,&lt;/P&gt;&lt;P&gt;In my current project, &lt;/P&gt;&lt;P&gt;Current status: Az databricks streaming jobs migrate Json file from kafka to raw layer(parquet file), then parsing logic is applied and 8 tables are created in raw standardized layer.&lt;/P&gt;&lt;P&gt;Requirement: Business team wants to access this data from on prem sql server and hence propose to follow an anti pattern(moving data from cloud to on-prem) as they cannot access databricks due to technical inability.&lt;/P&gt;&lt;P&gt;How can this be achieved? Using ADF?&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 20:40:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33032#M24122</guid>
      <dc:creator>Ruby8376</dc:creator>
      <dc:date>2022-08-30T20:40:47Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33033#M24123</link>
      <description>&lt;P&gt;@Werner Stinckens​&amp;nbsp;can u help?&lt;/P&gt;</description>
      <pubDate>Tue, 30 Aug 2022 20:42:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33033#M24123</guid>
      <dc:creator>Ruby8376</dc:creator>
      <dc:date>2022-08-30T20:42:58Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33034#M24124</link>
      <description>&lt;P&gt;You could indeed use ADF to copy the data from cloud to on-prem.&lt;/P&gt;&lt;P&gt;However, depending on the size of the data, this can take a while.&lt;/P&gt;&lt;P&gt;I use the same pattern, but for aggregated processed data, which is not an issue at all.&lt;/P&gt;&lt;P&gt;You could also look at Azure Synapse Serverless or Delta Sharing, or even SQL Hybrid.&lt;/P&gt;&lt;P&gt;But if the data MUST be on-prem, a copy seems the only way.&lt;/P&gt;&lt;P&gt;In that case I'd try to use Polybase for the copy (I think recent versions of Sql Server can use this).&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;But I would strongly suggest to not move raw data to on-prem.&lt;/P&gt;&lt;P&gt;There are several possibilities to query cloud data using SQL (maybe not SQL server but Azure Synapse Serverless also uses t-sql f.e.)&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 07:04:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33034#M24124</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-08-31T07:04:29Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33036#M24126</link>
      <description>&lt;P&gt;Hey thank you so much @Werner Stinckens​&amp;nbsp;.&lt;/P&gt;&lt;P&gt;Yea it is a must to move data to on -prem. Can you please share links/guide for the same?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;i was going through below link, looks like it is not going to be direct copy from delta lake. I would need to use an interim Azure storage instance(staged copy from delta lake)?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-databricks-delta-lake?tabs=data-factory" alt="https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-databricks-delta-lake?tabs=data-factory" target="_blank"&gt;https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-databricks-delta-lake?tabs=data-factory&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 31 Aug 2022 22:16:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33036#M24126</guid>
      <dc:creator>Ruby8376</dc:creator>
      <dc:date>2022-08-31T22:16:23Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33037#M24127</link>
      <description>&lt;P&gt;I think so. The staged copy is probably a parquet version or something of the delta lake table (I do not copy delta lake to on prem rdmbs at the moment).&lt;/P&gt;&lt;P&gt;If the tables to be copied are not Delta Lake but Parquet, the staging is not necessary.&lt;/P&gt;&lt;P&gt;The staging does not take a lot of time though, certainly not compared to inserting in SQL Server!&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 14:23:17 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33037#M24127</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-09-01T14:23:17Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33038#M24128</link>
      <description>&lt;P&gt;Currently, data is being read from json file and landed into raw layer in parquet format, after that 1 table is created where entire data is inserted (1 common raw standardised table)- then using parsing logic different tables are created.&lt;/P&gt;&lt;P&gt;what would be your suggestion to move this data to on prem rdbms? &lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 14:53:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33038#M24128</guid>
      <dc:creator>Ruby8376</dc:creator>
      <dc:date>2022-09-01T14:53:43Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33039#M24129</link>
      <description>&lt;P&gt;It depends on the use case.  If your colleagues want to do ad hoc analysis on these parsed tables, then yes.  But if they have a specific use case (or several), I'd prepare/transform/aggregate the data first and send that to sql server.&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 14:56:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33039#M24129</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-09-01T14:56:43Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33040#M24130</link>
      <description>&lt;P&gt;Agree! My only concern is the parsing logic that will have to apply while writing data to on prem sql server. Should we move this data to az sql first?&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 16:28:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33040#M24130</guid>
      <dc:creator>Ruby8376</dc:creator>
      <dc:date>2022-09-01T16:28:53Z</dc:date>
    </item>
    <item>
      <title>Re: Anti pattern : moving data from cloud to on-prem</title>
      <link>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33041#M24131</link>
      <description>&lt;P&gt;I'd apply all logic in databricks/spark as there you have the advantage of parallel processing.  Write the prepared data to AZ, so no transformations have to be done in the rdbms.&lt;/P&gt;</description>
      <pubDate>Mon, 05 Sep 2022 07:13:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/anti-pattern-moving-data-from-cloud-to-on-prem/m-p/33041#M24131</guid>
      <dc:creator>-werners-</dc:creator>
      <dc:date>2022-09-05T07:13:08Z</dc:date>
    </item>
  </channel>
</rss>

