SP_6721
Honored Contributor II

Hi @Pratikmsbsvm ,

The appropriate approach would be:

  • Data Ingestion:
    • Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.

  • Data Lakehouse Storage:
    • Store all raw data in Azure Data Lake Storage (ADLS) as Delta Lake tables to ensure ACID transactions and reliable data handling.

  • Analytical Data Handling:
    • Use Databricks SQL to power BI dashboards, reports, and analytical workloads on top of your gold layer.

  • Data Processing:
    • Organize data using the Medallion architecture:
      • Bronze - Raw ingested data
      • Silver - Cleaned and conformed data
      • Gold - Aggregated, business-ready data for reporting and consumption

  • Real-Time Delivery:
    • For Spryker’s 15-second real-time requirement, use Databricks Structured Streaming with Azure Event Hubs or Kafka.
    • Serve data to consumers like Salesforce, Spryker, and Mad Mobile via APIs or by sharing gold tables through REST endpoints or direct access.

  • Error Handling & Monitoring:
    • Monitor pipelines using Azure Monitor and Databricks system tables to catch failures or delays early.
    • Set up alerts and logging to track job health and ensure data quality across the pipeline.