cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to ingest data from SAP Data Services (ECC, IP, MDG, FLP, MRP) to Databricks Lakehouse on GCP ?

sanutopia
New Contributor

Hi Friends,

My customer is using Databricks (as GCP partner product). The ask is to ingest data from sources into Databricks Lakehouse. Currently customer has 3 types of sources : SAP (ECC, Hana) , Oracle and Kafka Stream

What are the Databricks native ETL service available to ingest data from all these 3 sources into Lakehouse ?

Is using GCP native ETL service (like Data Fusion or Dataflow) can be good option over Databricks native service ?

 Kindly reply. 

Thanks

Santanu

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Databricks on GCP offers several native ETL services and integration options to ingest data from SAP (ECC, HANA), Oracle, and Kafka Streams into the Lakehouse. Comparing Databricks-native solutions with GCP-native ETL like Data Fusion or Dataflow reveals tradeoffs in integration, scalability, and workflow preference.

Databricks Native ETL Services

Databricks provides dedicated connectors, pipeline frameworks, and automation tools for ingesting data from major enterprise systems:

  • SAP (ECC, HANA) Integration: Databricks introduced SAP connectors and integration flows, allowing the Lakehouse platform to ingest SAP data via JDBC or OData, often using Databricks Notebooks or tools like Lakeflow Declarative Pipelines for streamlined ETL.โ€‹

  • Oracle Integration: You can pull data from Oracle databases using Databricks JDBC connectors, Databricks notebooks, or Lakeflow Data Pipelines. Delta Live Tables (DLT) further automates CDC and batch ingestion from Oracle into Lakehouse.โ€‹

  • Kafka Stream Integration: Databricks supports batch and streaming ingestion from Apache Kafka using built-in connectors (like read_kafka), Structured Streaming APIs, and SQL (for Databricks SQL). These allow seamless ingestion directly into Delta Lake tables.โ€‹

  • Lakeflow Declarative Pipelines: Databricks Lakeflow is an orchestration and data pipeline service enabling low-code ETL from multiple data sources (including Oracle, SAP, Kafka).โ€‹

  • Delta Live Tables (DLT): Provides declarative pipeline management and automation for ETL, including support for incremental/streaming sources.โ€‹

  • Databricks Notebooks & Auto Loader: SQL/Python/Scala notebooks can leverage Auto Loader for scalable file ingestion, as well as custom code for JDBC or OData connectors.โ€‹

GCP Native ETL Services vs. Databricks ETL

GCPโ€™s core ETL products, Data Fusion and Dataflow, offer strong integration with GCP-native Lakehouse architectures (BigQuery, Dataplex), automated schema management, and robust scalability:

  • Data Fusion: A fully-managed, drag-and-drop ETL tool, supporting connections for enterprise databases and streaming sources, easily loading data to BigQuery, Data Lake, or Databricks tables.โ€‹

  • Dataflow: Built for scalable batch and stream processing pipelines (based on Apache Beam), native integration with GCP storage, analytics, and real-time data applications.โ€‹

  • Data Fusion/Dataflow excel in mixed-cloud, heavy GCP-centric architecturesโ€”especially when users prefer BigQuery for analytics or strong GCP governance features.

  • Databricks ETL is highly optimized for Spark-native workflows, advanced AI/ML with unified data management via Delta Lake, and supports low-code/no-code pipelines in Lakehouse environments, minimizing data movement and latency when working directly on Databricks.โ€‹

Summary Table

Source Databricks Native ETL GCP Native ETL Options
SAP Lakeflow, DLT, JDBC/OData Notebook โ€‹ Data Fusion (SAP connectors) โ€‹
Oracle Lakeflow, DLT, JDBC, Notebook โ€‹ Data Fusion, Dataflow (JDBC/CDC) โ€‹
Kafka Structured Streaming, read_kafka โ€‹ Dataflow (Beam KafkaIO), Data Fusion โ€‹
 
 

Key Considerations

  • Databricks ETL is preferred for Spark-native, advanced ML/AI, and unified Lakehouse operations.

  • GCP ETL services are robust for multi-cloud, GCP-centric, and BigQuery-focused analytics.

  • Both Databricks and GCP ETL offer connectors for major enterprise sources, but workflow, governance, and cost models differ substantially.

For unified Lakehouse architectures directly on Databricks, native Databricks ETL services (Lakeflow, DLT, and built-in connectors) are usually optimal unless there is a strong requirement to integrate with non-Databricks GCP analytics or governance tools.โ€‹