cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Managed tables and ADLS - infrastructure

cltj
New Contributor III

Hi all. I want to get this right and therefore I am reaching out to the community. 


We are using azure, and currently are using 1 Azure Data Lake Storage for development, and 1 for production. These are connected to dev and prod databricks workspaces. We are already using unity catalog, and restrict access to the datalakes by workspace.

I have a few of questions I would like to forward.
1. We are going through a restructure of infrastructure these days and I need to understand if using managed tables means that there will be only one ADLS?
2. Does this mean that we will not have the opportunity to managed data lifecycle in ADLS (Hot-Cool-Archive)? If not, how do people manage data lifecycle with unity catalog and managed tables?
3. Do anyone else use only managed tables in their solution (No External tables)? If yes how is it working out?
4. It at first seems a bit risky to not be able to drop tables, but cant this be handled through good access management and quick response times (through UNDROP TABLE 7 days)? The documentation also sais "When a managed table is dropped, its underlying data is deleted from your cloud tenant within 30 days". 

Please help me to understand this. Its a big commitment and is hard to reverse. 
Thanks in advance to all contributions


1 REPLY 1

ossinova
Contributor II

I recommend you read this article (Managed vs External tables) and answer the following questions:

  • do I require direct access to the data outside of Azure Databricks clusters or Databricks SQL warehouses?
    • If yes, then External is your only option

In relation to the other questions take a look at this article (Managed Storage Location). You can you use more than one ADLS location, if for instance you have 3 catalogs (dev_bronze, dev_silver, dev_gold) and map each one of these catalogs with its representing ADLS location (adls-bronze, adls-silver, adls-gold). In your case it might make more sense to keep your existing ADLS and have three containers (Bronze, Silver, Gold or equivalent if using medallion architecture) and map each container to either a catalog or schema. 

In terms of managed data lifecycle it gets a bit more complex. Read more here (in preview): (Archive Delta). Depending on the volume of data I would consider omitting this. Storage is cheap these days. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group