Best practices for initial large-scale ingestion from on‑premises Oracle to Databricks

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

Hello everyone,

I am responsible for designing and implementing a Lakehouse architecture in an industrial company.
I am currently facing some challenges regarding the initial ingestion of data from our on‑premise Oracle database into Databricks.

The data comes from production systems and is actively used by several applications. My main concern is that the initial load is very large, and I’m worried about impacting database performance or even causing issues if we extract all the data at once.

For the ongoing ingestion, the data volume will be much smaller and continuous, so that part is not an issue.
However, I would really appreciate advice or best practices on how to safely handle the first large‑scale ingestion (initial load) without overloading or disrupting the Oracle database.

What approaches, tools, or patterns would you recommend in this situation?

Thank you in advance for your help.