Re: How to build architecture for Batch as well St...

SP_6721 · ‎06-25-2025

The appropriate approach would be:

Data Ingestion:
- Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.
Data Lakehouse Storage:
- Store all raw data in Azure Data Lake Storage (ADLS) as Delta Lake tables to ensure ACID transactions and reliable data handling.
Analytical Data Handling:
- Use Databricks SQL to power BI dashboards, reports, and analytical workloads on top of your gold layer.
Data Processing:
- Organize data using the Medallion architecture:
  - Bronze - Raw ingested data
  - Silver - Cleaned and conformed data
  - Gold - Aggregated, business-ready data for reporting and consumption
Real-Time Delivery:
- For Spryker’s 15-second real-time requirement, use Databricks Structured Streaming with Azure Event Hubs or Kafka.
- Serve data to consumers like Salesforce, Spryker, and Mad Mobile via APIs or by sharing gold tables through REST endpoints or direct access.
Error Handling & Monitoring:
- Monitor pipelines using Azure Monitor and Databricks system tables to catch failures or delays early.
- Set up alerts and logging to track job health and ensure data quality across the pipeline.