โ06-02-2025 07:04 AM
Hello!
I'm new to Databricks.
Assume, I need to migrate 2 Tb Oracle Datamart to Databricks on Azure. Serverless SQL Warehouse seems as a valid choice.
What is a better option ( cost vs performance) to store the data?
Should I upload Oracle Extracts to Azure BLOB and create External tables?
Or it is better to use COPY INTO FROM to create managed tables?
Data size will grow by ~1 Tb per year.
Thank you!
โ06-02-2025 08:38 AM
Hi @Curious-mind ,
Welcome to using Databricks! For your use case, I think creating managed tables using COPY INTO are going to be more performative which will lead to better cost scalability as well. While external tables could initially be a bit cheaper, managed Delta tables offer significant performance and usability benefits that pay off as your data grows.
Here are a few benefits that managed tables offer over external tables:
Faster queries with indexing, caching, and Delta optimizations
Easier schema enforcement, versioning, and time travel
Seamless use with Unity Catalog, RBAC, and Serverless SQL
Better support for optimization (OPTIMIZE
, VACUUM
, etc.)
โ06-02-2025 08:38 AM
Hi @Curious-mind ,
Welcome to using Databricks! For your use case, I think creating managed tables using COPY INTO are going to be more performative which will lead to better cost scalability as well. While external tables could initially be a bit cheaper, managed Delta tables offer significant performance and usability benefits that pay off as your data grows.
Here are a few benefits that managed tables offer over external tables:
Faster queries with indexing, caching, and Delta optimizations
Easier schema enforcement, versioning, and time travel
Seamless use with Unity Catalog, RBAC, and Serverless SQL
Better support for optimization (OPTIMIZE
, VACUUM
, etc.)
โ06-03-2025 07:10 AM
If I get it right we can use Default storage for Managed Data or Set a managed storage location for a catalog :
CREATE CATALOG <catalog-name> MANAGED LOCATION 'abfss://<container-name>@<storage-account>.dfs.core.windows.net/<path>/<directory>';
What are the reasons to create our own Managed Location vs using Default one?
โ06-03-2025 11:08 AM
Hi @Curious-mind ,
The main benefit around creating your own managed location is just better isolation and management. It will depend on how large your data is, but if you want the data to be stored in specific locations by catalog, rather than having any new catalog created land in the root of the metastore, than specifying a managed location is what you'd want to do.
โ06-02-2025 10:05 AM - edited โ06-02-2025 10:15 AM
Hi @Shua42,
Thank you for the prompt reply.
For managed tables initial load:
Can I simply run a COPY command::
COPY INTO DELTA_TABLE FROM 'abfss://container@storageAccount.dfs.core.windows.net/base/path'
FILEFORMAT = CSV
FILES = ('f1.csv', 'f2.csv',...)
or it is better use Auto-Loader?
Some source Oracle tables can have 100M+ rows
โ06-02-2025 10:14 AM
You got it. Running the COPY INTO is good for the initial load as it's optimized for bulk loads. You'll want to use Auto-loader going forward to incrementally process new rows.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now