Hi @Pratikmsbsvm ,
Okay, since youโre going to use Databricks compute for data extraction and you wrote that your workspace is deployed with the secure connectivity cluster (NPIP) option enabled, you first need to make sure that you have a stable egress IP address.
Assuming that your workspace uses VNET injection (and not a managed VNET), to add explicit outbound methods for your workspace, use an Azure NAT gateway or user-defined routes (UDRs):
- Azure NAT gateway: Use an Azure NAT gateway to provide outbound internet connectivity for your deployments with a stable egress public IP. Configure the gateway on both of the workspace's subnets to ensure that all outbound traffic to the Azure backbone and public network transits through it. Clusters have a stable egress public IP, and you can modify the configuration for custom egress needs. You can configure this using either an Azure template or from the Azure portal.
- UDRs: Use UDRs if your deployments require complex routing requirements or your workspaces use VNet injection with an egress firewall. UDRs ensure that network traffic is routed correctly for your workspace, either directly to the required endpoints or through an egress firewall. To use UDRs, you must add direct routes or allowed firewall rules for the Azure Databricks secure cluster connectivity relay and other required endpoints listed at User-defined route settings for Azure Databricks.
Once you have the stable egress IP issue sorted out, you will then need to write code to fetch the data from Adobe and save it to ADLS.
If your source data is in one of the following formats, I recommend using Auto Loader:
Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage. It provides a Structured Streaming source called cloudFiles. So to keep it simple, it will automatically detect that new files arrived on data lake and process only new files (with exactly once semantic).
You can connect Auto Loader with a file arrival trigger. So when new files arrive in the storage, an event will be generated that automatically starts the workflow to process the new files using autloader mechanism described above.
Trigger jobs when new files arrive - Azure Databricks | Microsoft Learn