โ01-30-2025 10:11 PM - edited โ01-30-2025 10:12 PM
Hey Databricks enthusiasts!
Migrating to Unity Catalog? Understanding the difference between External S3 Location Tables and Managed Tables is crucial for optimizing governance, security, and cost efficiency.
๐นExternal S3 Location Tables
โ๏ธData remains in an existing S3 bucket, with Databricks referencing it externally.
โ๏ธUnity Catalog tracks metadata, but does not control the data lifecycle.
โ๏ธIdeal for multi-platform access or when organizations prefer to manage storage independently.
โChallenges: Lacks full governance, lifecycle control, and performance optimizations offered by Databricks-managed storage.
๐นManaged Tables
โ๏ธData is fully managed by Databricks, stored within its managed storage.
โ๏ธUnity Catalog controls both metadata and the physical data, ensuring strong governance, security, and lineage tracking.
โ๏ธBest suited for AI/ML workloads, compliance-driven use cases, and automated data lifecycle management.
โConsiderations: Requires migrating data into Databricks-managed storage, impacting existing workflows.
Which approach works best for your use case? Letโs discuss the trade-offs and strategies for seamless Unity Catalog migration!
โ01-30-2025 11:27 PM
There are two use cases where it's worth using external tables:
In other cases it's better to use manged tables, especially when you want to automate governance on them such as Liquid Clustering.
โ02-02-2025 02:00 PM
Hi MariuszK,
I appreciate your note. We had a discussion with a few internal Databricks architects as well as a Databricks architect. Based on their recommendations, tables that are frequently accessedโsuch as Gold layer tables for reporting, tables used by ML jobs, and real-time streaming tablesโshould be created as managed tables. This approach ensures better performance, optimization, and enhanced governance and security controls, including support for serverless jobs. Thanks.
โ02-02-2025 02:23 PM
Hey!
I hope Iโm not too late, and Iโd like to share my opinion. While itโs true that managed services offer certain advantages over external tables, you should keep in mind that Databricks services often come with an associated cost, such as Predictive Optimization. I recommend reviewing your workflow and checking the associated costs here: Databricks Pricing.
Itโs important to note that Databricks operates on a pay-as-you-go model, but in most cases, having control over the service and being able to manage resources through your cloud providerโfor example, horizontal autoscaling, cluster size adjustments, etc.โoften results in a lower overall bill. I recommend conducting a cost analysis to determine which processes could benefit from migrating to managed services and which ones might not be worth it.
In general, managed services provide better performance, optimization, and enhanced governance and security controls, including support for serverless jobs, but everything comes at a much higher cost.
๐
โ02-02-2025 07:04 PM
Thank you for sharing your insights! You make a great point about the cost considerations associated with managed services in Databricks. While managed tables offer advantages in terms of performance, optimization, governance, and security, itโs always important to evaluate cost implications based on specific workloads.
A cost-benefit analysis can help determine which processes truly benefit from managed services versus those that can be optimized through cloud provider resource management (e.g., horizontal autoscaling, cluster size adjustments). Weโll take your feedback into account and ensure the right balance between cost efficiency and operational benefits.
Appreciate your input!
โ01-30-2025 11:27 PM
There are two use cases where it's worth using external tables:
In other cases it's better to use manged tables, especially when you want to automate governance on them such as Liquid Clustering.
โ02-02-2025 02:00 PM
Hi MariuszK,
I appreciate your note. We had a discussion with a few internal Databricks architects as well as a Databricks architect. Based on their recommendations, tables that are frequently accessedโsuch as Gold layer tables for reporting, tables used by ML jobs, and real-time streaming tablesโshould be created as managed tables. This approach ensures better performance, optimization, and enhanced governance and security controls, including support for serverless jobs. Thanks.
โ02-02-2025 02:23 PM
Hey!
I hope Iโm not too late, and Iโd like to share my opinion. While itโs true that managed services offer certain advantages over external tables, you should keep in mind that Databricks services often come with an associated cost, such as Predictive Optimization. I recommend reviewing your workflow and checking the associated costs here: Databricks Pricing.
Itโs important to note that Databricks operates on a pay-as-you-go model, but in most cases, having control over the service and being able to manage resources through your cloud providerโfor example, horizontal autoscaling, cluster size adjustments, etc.โoften results in a lower overall bill. I recommend conducting a cost analysis to determine which processes could benefit from migrating to managed services and which ones might not be worth it.
In general, managed services provide better performance, optimization, and enhanced governance and security controls, including support for serverless jobs, but everything comes at a much higher cost.
๐
โ02-02-2025 07:04 PM
Thank you for sharing your insights! You make a great point about the cost considerations associated with managed services in Databricks. While managed tables offer advantages in terms of performance, optimization, governance, and security, itโs always important to evaluate cost implications based on specific workloads.
A cost-benefit analysis can help determine which processes truly benefit from managed services versus those that can be optimized through cloud provider resource management (e.g., horizontal autoscaling, cluster size adjustments). Weโll take your feedback into account and ensure the right balance between cost efficiency and operational benefits.
Appreciate your input!
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now