cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Integration patterns with Oracle database

HideK253
New Contributor

Hello everyone

I'd like to try integration of Data Bricks with on-premise Oracle database, and now I have the following  fundamental questions as below about the architectures of that integration.

1) I guess the following is the list of integration way with Oracle database, but is there any other options?

- Convert table data to csv/parquet via ETL tool, and ingest csv/parquet via COPY INTO/Auto Loader

- Using Lakehouse Federation to query table data

- Using Query based connectors for LakeFlow to ingest table data

2) Does Lakehouse Federation and Query based connectors for Oracle use JDBC connections?

3) Is there any ways to implement CDC for Oracle? I guess Oracle is not listed in the supported database for CDC LakeFlow.

 

Thank you

 

1 REPLY 1

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @HideK253,

Your list is essentially correct. The main Databricks-native patterns are export to files + COPY INTO / Auto Loader, Lakehouse Federation and Lakeflow Connect. A couple of other options are listed below.

  • Custom Spark JDBC jobs: spark.read.format("jdbc") from Oracle into Delta, often with your own watermark column for incremental loads.
  • You can also pair third-party CDC tools (Oracle GoldenGate, Fivetran/HVR, Debezium+Kafka, etc.) with Databricks to land CDC data into cloud storage and then process it into Delta.

In terms of your query about Lakehouse Federation and JDBC-based query connectors, the answer is yes for both. Lakehouse Federation pushes queries down via JDBC to the foreign database, including Oracle. Lakeflow Connect query-based connectors use Unity Catalog connections and Lakehouse Federation under the hood. The Oracle connector is part of this JDBC-based family. You donโ€™t manage the JDBC details directly. Databricks does.

Youโ€™re correct. Oracle is not yet in the public GA list of Lakeflow CDC database connectors (those call out MySQL/PostgreSQL/SQL Server).. So, the best options as of today are...

Lakeflow Connect query-based connector for Oracle that provides incremental loads via a cursor column (and supports SCD1/SCD2 and soft/hard deletes), but itโ€™s not log-based CDC... changes are detected by querying the source, not by reading redo logs.

(Or)

You can use Oracle CDC tools (GoldenGate/Fivetran/Debezium/etc) to capture redo logs and land a CDC feed in object storage or Kafka, then use Lakeflow Spark Declarative Pipelines with AUTO CDC / AUTO CDC FROM SNAPSHOT to maintain SCD1/SCD2 Delta tables.

For true CDC youโ€™d use an external CDC tool plus AUTO CDC, and for simpler incremental loads you can use the Oracle query-based connector.

Hope this helps.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***