Databricks Community

Pratikmsbsvm · ‎06-23-2025

Hello,

I am planning to Create a Data Lake house using Azure and Databricks.

Earlier i planned to do with Azure, but use cases looks complex.

Can someone please help me with suggestions.

Source System : SAP, SALESFORCE, SAP CAR, Adobe Clickstream.

Consumer : Salesforce, Spryker, Mad Mobile [API led Integration]

How to Handle Analytical Data

How to Handle Transactional data

Error Handling and connectivity

Real time data consume by spryker every 15 sec.

Thanks a lot for suggestion

SP_6721 · ‎06-25-2025

The appropriate approach would be:

Data Ingestion:
- Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.
Data Lakehouse Storage:
- Store all raw data in Azure Data Lake Storage (ADLS) as Delta Lake tables to ensure ACID transactions and reliable data handling.
Analytical Data Handling:
- Use Databricks SQL to power BI dashboards, reports, and analytical workloads on top of your gold layer.
Data Processing:
- Organize data using the Medallion architecture:
  - Bronze - Raw ingested data
  - Silver - Cleaned and conformed data
  - Gold - Aggregated, business-ready data for reporting and consumption
Real-Time Delivery:
- For Spryker’s 15-second real-time requirement, use Databricks Structured Streaming with Azure Event Hubs or Kafka.
- Serve data to consumers like Salesforce, Spryker, and Mad Mobile via APIs or by sharing gold tables through REST endpoints or direct access.
Error Handling & Monitoring:
- Monitor pipelines using Azure Monitor and Databricks system tables to catch failures or delays early.
- Set up alerts and logging to track job health and ensure data quality across the pipeline.