Databricks Community

HideK253 · ‎04-20-2026

Hello everyone

I'd like to try integration of Data Bricks with on-premise Oracle database, and now I have the following fundamental questions as below about the architectures of that integration.

1) I guess the following is the list of integration way with Oracle database, but is there any other options?

- Convert table data to csv/parquet via ETL tool, and ingest csv/parquet via COPY INTO/Auto Loader

- Using Lakehouse Federation to query table data

- Using Query based connectors for LakeFlow to ingest table data

2) Does Lakehouse Federation and Query based connectors for Oracle use JDBC connections?

3) Is there any ways to implement CDC for Oracle? I guess Oracle is not listed in the supported database for CDC LakeFlow.

Thank you

Ashwin_DSA · ‎04-22-2026

Hi @HideK253,

Your list is essentially correct. The main Databricks-native patterns are export to files + COPY INTO / Auto Loader, Lakehouse Federation and Lakeflow Connect. A couple of other options are listed below.

Custom Spark JDBC jobs: spark.read.format("jdbc") from Oracle into Delta, often with your own watermark column for incremental loads.
You can also pair third-party CDC tools (Oracle GoldenGate, Fivetran/HVR, Debezium+Kafka, etc.) with Databricks to land CDC data into cloud storage and then process it into Delta.

In terms of your query about Lakehouse Federation and JDBC-based query connectors, the answer is yes for both. Lakehouse Federation pushes queries down via JDBC to the foreign database, including Oracle. Lakeflow Connect query-based connectors use Unity Catalog connections and Lakehouse Federation under the hood. The Oracle connector is part of this JDBC-based family. You don’t manage the JDBC details directly. Databricks does.

You’re correct. Oracle is not yet in the public GA list of Lakeflow CDC database connectors (those call out MySQL/PostgreSQL/SQL Server).. So, the best options as of today are...

Lakeflow Connect query-based connector for Oracle that provides incremental loads via a cursor column (and supports SCD1/SCD2 and soft/hard deletes), but it’s not log-based CDC... changes are detected by querying the source, not by reading redo logs.

(Or)

You can use Oracle CDC tools (GoldenGate/Fivetran/Debezium/etc) to capture redo logs and land a CDC feed in object storage or Kafka, then use Lakeflow Spark Declarative Pipelines with AUTO CDC / AUTO CDC FROM SNAPSHOT to maintain SCD1/SCD2 Delta tables.

For true CDC you’d use an external CDC tool plus AUTO CDC, and for simpler incremental loads you can use the Oracle query-based connector.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

Databricks Community

Integration patterns with Oracle database

🌟 Community Pulse: Your Weekly Roundup! June 22 – 28, 2026

Solution Accelerator Series | Product Quality Inspection

Upcoming Community BrickTalk: Bringing (Geo)Spatial Awareness to your Conversational Agents

Databricks Community Champion - June 2026 - Amira Bedhiafi

Build apps without jumping through hoops