cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Technical Architecture - Feedback

SumedhPuri
New Contributor

Hello Members

I have designed a Technical Architecture (image attached). I would like some feedback on the current design (especially from 5.1 and onwards) and maybe some more ideas or anything else I can use instead of Azure Service Bus and Cosmos DB to store into our Data Platform and to retrieve information. Is there anything else available from Databricks as an alternative for data retrieval and storage?

explanation of each step

1 - Varying sources (CSV, SFTP Websites, SharePoint, SQL Server, CRM Dynamics ,  APIs etc.)

2 - Sources are ingested using Azure Data Factory into Azure Databricks

3 - Databricks is orchestrated using Azure Data Factory

4.1 & 4.2 - Using Azure Databricks I store and process/transform/load my data onto Delta Lake

5.1  - Using Azure Data Factory I expose the transformed data to Web App database (like Azure Cosmos DB).

5.2 - I also use Power BI to serve data to the business directly from Delta Lake

6.1 - To store and retrieve information - Web App is connected to Azure Cosmos DB

6.2 - Any Form Submission received from the business is then pushed back into my Delta Lake with a service bus (with the help of Azure Data Factory)

I - Metadata Scanning for Databricks/New Sources etc.

 

Any help is appreciated.

1 REPLY 1

Schofield
New Contributor III

In the step 3 you will want to consider using Databricks Workflows for orchestration.  The ADF databricks notebook activity is not actively developed by microsoft and the API it uses is legacy by Databricks So neither vendor  is actively supporting this integration.  There are some oddities in the implementation related to security which will cause headaches for your platform support team.  If you must use ADF be sure to simply invoke the Databricks jobs API via REST.  The pattern is described here.  https://techcommunity.microsoft.com/t5/analytics-on-azure-blog/leverage-azure-databricks-jobs-orches.... This approach lets you call all the other APIs in Databricks as well such as DLT.

Also consider your security concerns for serving data via PowerBI directly from DeltaLake.  For sensitive regulated data you may have encrypted and or obfuscated date you need to deal with by filtering or masking which requires an SQL engine to process before serving to PowerBI.  It may be beneficial to have PowerBI load data via the Databricks Serverless SQL warehouse where proper decryption, row filter and column masking can be applied based on your security rules.  Direct access to cloud storage simply can't do this as the security mechanism of ADLS simply don't know what data is inside each file.  Which means PowerBI will have to implement a fine-grain security layer.  Now you have a data security layer in Databricks and PowerBI which increases the complexity of implementing and managing your data governance strategy.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group