topic How to build architecture for Batch as well Stream Data Pipeline in Databricks in Data Engineering

How to build architecture for Batch as well Stream Data Pipeline in Databricks

Pratikmsbsvm — Tue, 24 Jun 2025 04:39:01 GMT

Hello,

I am planning to Create a Data Lake house using Azure and Databricks.

Earlier i planned to do with Azure, but use cases looks complex.

Can someone please help me with suggestions.

Source System : SAP, SALESFORCE, SAP CAR, Adobe Clickstream.

Consumer : Salesforce, Spryker, Mad Mobile [API led Integration]

How to Handle Analytical Data

How to Handle Transactional data

Error Handling and connectivity

Real time data consume by spryker every 15 sec.

Thanks a lot for suggestion

SP_6721 — Wed, 25 Jun 2025 12:11:44 GMT

The appropriate approach would be:

Data Ingestion:
- Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.
Data Lakehouse Storage:
- Store all raw data in Azure Data Lake Storage (ADLS) as Delta Lake tables to ensure ACID transactions and reliable data handling.
Analytical Data Handling:
- Use Databricks SQL to power BI dashboards, reports, and analytical workloads on top of your gold layer.
Data Processing:
- Organize data using the Medallion architecture:
  - Bronze - Raw ingested data
  - Silver - Cleaned and conformed data
  - Gold - Aggregated, business-ready data for reporting and consumption
Real-Time Delivery:
- For Spryker’s 15-second real-time requirement, use Databricks Structured Streaming with Azure Event Hubs or Kafka.
- Serve data to consumers like Salesforce, Spryker, and Mad Mobile via APIs or by sharing gold tables through REST endpoints or direct access.
Error Handling & Monitoring:
- Monitor pipelines using Azure Monitor and Databricks system tables to catch failures or delays early.
- Set up alerts and logging to track job health and ensure data quality across the pipeline.

Pratikmsbsvm — Wed, 25 Jun 2025 14:44:58 GMT

@SP_6721 : Thanks a lot. but how to handle Transactional data. do I need to add Azure SQL ? Please suggest.