cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to build architecture for Batch as well Stream Data Pipeline in Databricks

Pratikmsbsvm
New Contributor III

Hello,

I am planning to Create a Data Lake house using Azure and Databricks.

Earlier i planned to do with Azure, but use cases looks complex.

Can someone please help me with suggestions.

Source System : SAP, SALESFORCE, SAP CAR, Adobe Clickstream.

Consumer : Salesforce, Spryker, Mad Mobile [API led Integration]

How to Handle Analytical Data

How to Handle Transactional data

Error Handling and connectivity

Real time data consume by spryker every 15 sec.

Thanks a lot for suggestion

2 REPLIES 2

SP_6721
Contributor III

Hi @Pratikmsbsvm ,

The appropriate approach would be:

  • Data Ingestion:
    • Ingest data from SAP, SAP CAR, and Salesforce using Azure Data Factory or third-party connectors. For near real-time updates, enable CDC-based ingestion.

  • Data Lakehouse Storage:
    • Store all raw data in Azure Data Lake Storage (ADLS) as Delta Lake tables to ensure ACID transactions and reliable data handling.

  • Analytical Data Handling:
    • Use Databricks SQL to power BI dashboards, reports, and analytical workloads on top of your gold layer.

  • Data Processing:
    • Organize data using the Medallion architecture:
      • Bronze - Raw ingested data
      • Silver - Cleaned and conformed data
      • Gold - Aggregated, business-ready data for reporting and consumption

  • Real-Time Delivery:
    • For Sprykerโ€™s 15-second real-time requirement, use Databricks Structured Streaming with Azure Event Hubs or Kafka.
    • Serve data to consumers like Salesforce, Spryker, and Mad Mobile via APIs or by sharing gold tables through REST endpoints or direct access.

  • Error Handling & Monitoring:
    • Monitor pipelines using Azure Monitor and Databricks system tables to catch failures or delays early.
    • Set up alerts and logging to track job health and ensure data quality across the pipeline.

Pratikmsbsvm
New Contributor III

@SP_6721 : Thanks a lot. but how to handle Transactional data. do I need to add Azure SQL ? Please suggest.