cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Adobe Campaign to Azure Databricks file transfer

Datalight
New Contributor III

I have to create a Data Pipeline which pushes Data (2. JSON FILE)  from Source Adobe using corn job to ADLS Gen2.

Datalight_0-1755698274906.png

 

1. How My ADLS Gen2 will know the new file came to container from Adobe. I am using Databricks as orchestrator and ETL tool.

2. What all are my network and Security approach when data moved from Adobe to ADLS Gen2 

3. There are 2 files. delivery and tracking. how the folder structure be best. [Seperate or same folder]. i have created seperate, but not sure if it is a best practise.

Please help

Thanks a lot

9 REPLIES 9

szymon_dybczak
Esteemed Contributor III

Hi @Datalight ,

1. You can use file arrival triggers for such scenario

 Trigger jobs when new files arrive - Azure Databricks | Microsoft Learn

2. This is too broadly stated question. We don't know anything on your current setup. For instance, does your workspace VNET Injected with SCC enabled? 

3. I would put them in separate folde (especially if their schema differs)

@szymon_dybczak : Thanks a lot. It is VNET Injected. All the resources running on private subnets.

Please let me know if I you need any other information.

Thanks a lot

szymon_dybczak
Esteemed Contributor III

Hi @Datalight ,

Yes, I have 2 questions:

1. From where your cron job will be executed?

2. Does your ADSL account has disabled public network access?

 

@szymon_dybczak : 

1. From where your cron job will be executed?

Answer : There is no public access for external system. so cron job is databricks job in Azure.

2. Does your ADSL account has disabled public network access?

No

Kindly share your suggestion.

Many Thanks

szymon_dybczak
Esteemed Contributor III

So from networking/security perspective one thing doesn't add up for me. You have a VNET Injected workspace, but your storage account has public access enabled. This is security risk. You should disable public access and create private endpoints. Then, all your resources will be able to talk privately between each other.

@szymon_dybczak : Thanks. done. anything else you think should I take care of it with respect to networking/security perspective.

szymon_dybczak
Esteemed Contributor III

You know, Ii depends on how secure you want your environment to be. For example, you can enable SCC option in your VNET injected workspace. When secure cluster connectivity is enabled, customer virtual networks have no open ports and compute resources in the classic compute plane have no public IP addresses.

Then, you can define user defined routes, apply them on databricks subnets and redirect all outbound traffic through azure firewall. 

But when you don't have specific requirements, you should just stick to the approach used by your organization. 

 

 

Datalight
New Contributor III

My Workspace is not have Unity Catalog Enabled.

nachoBot
New Contributor II

Datalight,

With regards to 1) I see that you are using the Medallion Architecture. Have you considered using AutoLoader for the detection and ingestion of new files in ADLS Gen2

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now