<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Read Files from Adobe and Push to Delta table ADLS Gen2 in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/read-files-from-adobe-and-push-to-delta-table-adls-gen2/m-p/130466#M48802</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/143693"&gt;@Pratikmsbsvm&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Okay, since you’re going to use Databricks compute for data extraction and you wrote that your workspace is deployed with the secure connectivity cluster &lt;STRONG&gt;(NPIP)&lt;/STRONG&gt; option enabled, you first need to make sure that you have a stable egress IP address.&lt;/P&gt;&lt;P&gt;Assuming that your workspace uses VNET injection (and not a managed VNET), to add explicit outbound methods for your workspace, use an Azure NAT gateway or user-defined routes (UDRs):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Azure NAT gateway&lt;/STRONG&gt;: Use an Azure NAT gateway to provide outbound internet connectivity for your deployments with a stable egress public IP. Configure the gateway on both of the workspace's subnets to ensure that all outbound traffic to the Azure backbone and public network transits through it. Clusters have a stable egress public IP, and you can modify the configuration for custom egress needs. You can configure this using either an&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.databricks/databricks-all-in-one-template-for-vnet-injection-with-nat-gateway" target="_blank" rel="noopener"&gt;Azure template&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or from the Azure portal.&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;UDRs&lt;/STRONG&gt;: Use UDRs if your deployments require complex routing requirements or your workspaces use VNet injection with an egress firewall. UDRs ensure that network traffic is routed correctly for your workspace, either directly to the required endpoints or through an egress firewall. To use UDRs, you must add direct routes or allowed firewall rules for the Azure Databricks secure cluster connectivity relay and other required endpoints listed at&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/udr" target="_blank" rel="noopener"&gt;User-defined route settings for Azure Databricks&lt;/A&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Once you have the stable egress IP issue sorted out, you will then need to write code to fetch the data from Adobe and save it to ADLS.&lt;BR /&gt;If your source data is in one of the following formats, I recommend using Auto Loader:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;avro : Avro files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;binaryFile : Binary files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;csv : CSV files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;json : JSON files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;orc : ORC files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;parquet : Parquet files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;text : TXT files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;xml : XML files&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage. It provides a Structured Streaming source called &lt;STRONG&gt;cloudFiles&lt;/STRONG&gt;. So to keep it simple, it will automatically detect that new files arrived on data lake and process only new files (with exactly once semantic).&lt;/P&gt;&lt;P&gt;You can connect Auto Loader with a &lt;STRONG&gt;file arrival trigger&lt;/STRONG&gt;. So when new files arrive in the storage, an event will be generated that automatically starts the workflow to process the new files using autloader mechanism described above.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/jobs/file-arrival-triggers" target="_blank" rel="noopener"&gt;Trigger jobs when new files arrive - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Tue, 02 Sep 2025 08:26:20 GMT</pubDate>
    <dc:creator>szymon_dybczak</dc:creator>
    <dc:date>2025-09-02T08:26:20Z</dc:date>
    <item>
      <title>Read Files from Adobe and Push to Delta table ADLS Gen2</title>
      <link>https://community.databricks.com/t5/data-engineering/read-files-from-adobe-and-push-to-delta-table-adls-gen2/m-p/130399#M48775</link>
      <description>&lt;P&gt;The Upstream is sending 2 files of different schema.&amp;nbsp;&lt;/P&gt;&lt;P&gt;The Storage Account has Private Endpoints.&amp;nbsp;&lt;SPAN&gt;there is &lt;STRONG&gt;no public access.&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;no public IP (NPIP) = yes.&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;How to design using only Databricks :-&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;1. Databricks API to read data file from Adobe and Push it to ADLS Container.&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;2. Pulling new Data file whenever available. (Polling or pulling)&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Pratikmsbsvm_0-1756741451588.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/19549iBE668316E7278B0E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Pratikmsbsvm_0-1756741451588.png" alt="Pratikmsbsvm_0-1756741451588.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;3. I want to replace Event Grid and Function App with Databricks , please help how to do that.&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;&lt;STRONG&gt;&amp;nbsp;&lt;/STRONG&gt;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 01 Sep 2025 15:45:41 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-files-from-adobe-and-push-to-delta-table-adls-gen2/m-p/130399#M48775</guid>
      <dc:creator>Pratikmsbsvm</dc:creator>
      <dc:date>2025-09-01T15:45:41Z</dc:date>
    </item>
    <item>
      <title>Re: Read Files from Adobe and Push to Delta table ADLS Gen2</title>
      <link>https://community.databricks.com/t5/data-engineering/read-files-from-adobe-and-push-to-delta-table-adls-gen2/m-p/130448#M48799</link>
      <description>&lt;P&gt;Hello &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/143693"&gt;@Pratikmsbsvm&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Good day&lt;/P&gt;&lt;P&gt;Here is the design for your requirements.&amp;nbsp;&lt;/P&gt;&lt;H3 id="toc-hId-1421353024"&gt;&lt;SPAN&gt;Recommended Architecture (High-Level View)&lt;/SPAN&gt;&lt;/H3&gt;&lt;P class="lia-align-center"&gt;&lt;SPAN&gt;[ SAP / Salesforce / Adobe ]&lt;BR /&gt;│&lt;BR /&gt;▼&lt;BR /&gt;Ingestion Layer (via ADF / Synapse / Partner Connectors / REST API)&lt;BR /&gt;│&lt;BR /&gt;▼&lt;BR /&gt;┌───────────────────────────┐&lt;BR /&gt;│ Azure Data Lake Gen2 │ (&lt;STRONG&gt;Storage&lt;/STRONG&gt;&amp;nbsp;layer - centralized)&lt;BR /&gt;│ + Delta Lake for ACID │&lt;BR /&gt;└───────────────────────────┘&lt;BR /&gt;│&lt;BR /&gt;▼&lt;BR /&gt;Azure Databricks (&lt;STRONG&gt;Primary&lt;/STRONG&gt;&amp;nbsp;Workspace)&lt;BR /&gt;├─ Bronze: Raw Data&lt;BR /&gt;├─ Silver: Cleaned &amp;amp; Transformed&lt;BR /&gt;└─ Gold: Aggregated / Business Logic Applied&lt;BR /&gt;│&lt;BR /&gt;├──&amp;gt;&amp;nbsp;&lt;STRONG&gt;Load to&lt;/STRONG&gt;&amp;nbsp;Hightouch / Mad Mobile (via REST APIs / Hightouch Sync)&lt;BR /&gt;└──&amp;gt; Share curated Delta Tables to Other Databricks Workspace (via Delta Sharing or External Table Mount)&lt;/SPAN&gt;&lt;/P&gt;&lt;H2 id="toc-hId--934290432"&gt;Key Components &amp;amp; Patterns&lt;/H2&gt;&lt;H3 id="toc-hId-612006398"&gt;1.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Ingestion Options&lt;/STRONG&gt;&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Azure Data Factory&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Partner Connectors&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(like Fivetran- We use it often our project) to ingest data from:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;SAP&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ via OData / RFC connectors&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Salesforce&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ via REST/Bulk API&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Adobe&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;→ via API or S3 data export&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3 id="toc-hId--1940150563"&gt;2.&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Storage &amp;amp; Processing Layer&lt;/STRONG&gt;&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Store all raw and processed data in&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;ADLS Gen2&lt;/STRONG&gt;, with&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Delta Lake format&lt;/STRONG&gt;&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Organize Lakehouse zones:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Bronze&lt;/STRONG&gt;: Raw ingested files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Silver&lt;/STRONG&gt;: Cleaned &amp;amp; de-duplicated&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Gold&lt;/STRONG&gt;: Ready for consumption (BI / API sync)&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;H3 id="toc-hId--197340228"&gt;&lt;STRONG&gt;Cross-Workspace Databricks Access (This is Your Core Challenge and most important)&lt;/STRONG&gt;&lt;/H3&gt;&lt;H3&gt;&lt;STRONG&gt;Delta Sharing&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(Recommended if in different orgs/subscriptions)&lt;/H3&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;&lt;STRONG&gt;Securely share Delta tables&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;from one workspace to another without copying data&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Works across different cloud accounts&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H2 id="toc-hId--1619519975"&gt;Governance / Security Recommendations&lt;/H2&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;Unity Catalog&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;(if available) for fine-grained access control&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Encrypt data at rest (ADLS) and in transit&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;Use&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;service principals&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;managed identities&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;for secure access between services&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;H3 id="toc-hId--73223145"&gt;&lt;SPAN&gt;Summary Visual (Simplified)&lt;/SPAN&gt;&lt;/H3&gt;&lt;PRE&gt;&lt;SPAN&gt; Sources →           Ingestion →    Delta Lakehouse →            Destinations&lt;BR /&gt;[SAP, SFDC, Adobe]   [ADF, APIs]    [Bronze, Silver, Gold]      [Hightouch, Mad Mobile, Other DBX]&lt;BR /&gt;                                      ▲&lt;BR /&gt;                                      │&lt;BR /&gt;                                  Cross-Workspace Access (Delta Sharing / Mounting / Jobs)&lt;BR /&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;Let me know if this helps&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV class=""&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV class=""&gt;Do you have any idea about APIs the code with you to connect from adobe to databricks?&lt;/DIV&gt;</description>
      <pubDate>Tue, 02 Sep 2025 07:04:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-files-from-adobe-and-push-to-delta-table-adls-gen2/m-p/130448#M48799</guid>
      <dc:creator>Khaja_Zaffer</dc:creator>
      <dc:date>2025-09-02T07:04:20Z</dc:date>
    </item>
    <item>
      <title>Re: Read Files from Adobe and Push to Delta table ADLS Gen2</title>
      <link>https://community.databricks.com/t5/data-engineering/read-files-from-adobe-and-push-to-delta-table-adls-gen2/m-p/130466#M48802</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/143693"&gt;@Pratikmsbsvm&lt;/a&gt;&amp;nbsp;,&lt;/P&gt;&lt;P&gt;Okay, since you’re going to use Databricks compute for data extraction and you wrote that your workspace is deployed with the secure connectivity cluster &lt;STRONG&gt;(NPIP)&lt;/STRONG&gt; option enabled, you first need to make sure that you have a stable egress IP address.&lt;/P&gt;&lt;P&gt;Assuming that your workspace uses VNET injection (and not a managed VNET), to add explicit outbound methods for your workspace, use an Azure NAT gateway or user-defined routes (UDRs):&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Azure NAT gateway&lt;/STRONG&gt;: Use an Azure NAT gateway to provide outbound internet connectivity for your deployments with a stable egress public IP. Configure the gateway on both of the workspace's subnets to ensure that all outbound traffic to the Azure backbone and public network transits through it. Clusters have a stable egress public IP, and you can modify the configuration for custom egress needs. You can configure this using either an&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://github.com/Azure/azure-quickstart-templates/tree/master/quickstarts/microsoft.databricks/databricks-all-in-one-template-for-vnet-injection-with-nat-gateway" target="_blank" rel="noopener"&gt;Azure template&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;or from the Azure portal.&lt;/LI&gt;&lt;/UL&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;UDRs&lt;/STRONG&gt;: Use UDRs if your deployments require complex routing requirements or your workspaces use VNet injection with an egress firewall. UDRs ensure that network traffic is routed correctly for your workspace, either directly to the required endpoints or through an egress firewall. To use UDRs, you must add direct routes or allowed firewall rules for the Azure Databricks secure cluster connectivity relay and other required endpoints listed at&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/security/network/classic/udr" target="_blank" rel="noopener"&gt;User-defined route settings for Azure Databricks&lt;/A&gt;.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Once you have the stable egress IP issue sorted out, you will then need to write code to fetch the data from Adobe and save it to ADLS.&lt;BR /&gt;If your source data is in one of the following formats, I recommend using Auto Loader:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;P&gt;avro : Avro files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;binaryFile : Binary files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;csv : CSV files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;json : JSON files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;orc : ORC files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;parquet : Parquet files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;text : TXT files&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;xml : XML files&lt;/P&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage. It provides a Structured Streaming source called &lt;STRONG&gt;cloudFiles&lt;/STRONG&gt;. So to keep it simple, it will automatically detect that new files arrived on data lake and process only new files (with exactly once semantic).&lt;/P&gt;&lt;P&gt;You can connect Auto Loader with a &lt;STRONG&gt;file arrival trigger&lt;/STRONG&gt;. So when new files arrive in the storage, an event will be generated that automatically starts the workflow to process the new files using autloader mechanism described above.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://learn.microsoft.com/en-us/azure/databricks/jobs/file-arrival-triggers" target="_blank" rel="noopener"&gt;Trigger jobs when new files arrive - Azure Databricks | Microsoft Learn&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Sep 2025 08:26:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/read-files-from-adobe-and-push-to-delta-table-adls-gen2/m-p/130466#M48802</guid>
      <dc:creator>szymon_dybczak</dc:creator>
      <dc:date>2025-09-02T08:26:20Z</dc:date>
    </item>
  </channel>
</rss>

