cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to design Airship Integration with Azure Databricks

Datalight
New Contributor III

Hello,

I have to push data from Airship and persists it to Delta tables. I think We can used SFTP , May someone please help me how to design the inbound part , it using SFTP on Airship end to push file on ADLS Gen2.

networking and security considerations of how this could work.

Datalight_0-1757430273153.png

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

ManojkMohan
Contributor III

Inbound Flow Design
Enable SFTP on the ADLS Gen2 (or Azure Blob Storage) account;

Generate and register an SSH public/private key pair with Airship, enter your SFTP endpoint credentials (username, host, port, key) in Airship’s settings to authenticate uploads.

Configure Airship to push files (CSV or other supported formats) to the specific SFTP directory in your ADLS Gen2 account, designated for inbound data.

Trigger Azure Data Factory (ADF) or Databricks jobs using storage events (e.g., SFTP commit events, which ensure files are fully uploaded before processing) to ingest these files, transforming them into Delta tables

View solution in original post

5 REPLIES 5

ManojkMohan
Contributor III

Inbound Flow Design
Enable SFTP on the ADLS Gen2 (or Azure Blob Storage) account;

Generate and register an SSH public/private key pair with Airship, enter your SFTP endpoint credentials (username, host, port, key) in Airship’s settings to authenticate uploads.

Configure Airship to push files (CSV or other supported formats) to the specific SFTP directory in your ADLS Gen2 account, designated for inbound data.

Trigger Azure Data Factory (ADF) or Databricks jobs using storage events (e.g., SFTP commit events, which ensure files are fully uploaded before processing) to ingest these files, transforming them into Delta tables

@ManojkMohan : what will be the better approach. to have seperate Landing zone and than Bronze layer or only 1 Bronze layer can be treated as landing zone too. ?

szymon_dybczak
Esteemed Contributor III

Hi @Datalight ,

If I were you I would add separate landing zone. In our poject landing zone has been extremely valuable. Among other things it lets you separate concerns in terms of extracting data vs loading/processing it. It also allows you to easily reprocess, for all the reasons that can be needed.

@szymon_dybczak : Thanks a lot.

What do you think, how much effort would be in devops side from bronze to landing first than bronze, if we already deployed the 4 data pipeline use cases.

 

Suggested Design

  1. SFTP uploads from Airship go to a Landing Zone folder in ADLS Gen2.
  2. Storage event triggers an orchestration pipeline (ADF/Databricks) to read and process raw files.
  3. Processed data lands in the Bronze Layer Delta tables with schema enforcement.
  4. Silver and Gold layers follow based on business requirements.

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now