- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago - last edited 4 weeks ago
Hey Databricks enthusiasts!
Migrating to Unity Catalog? Understanding the difference between External S3 Location Tables and Managed Tables is crucial for optimizing governance, security, and cost efficiency.
🔹External S3 Location Tables
✔️Data remains in an existing S3 bucket, with Databricks referencing it externally.
✔️Unity Catalog tracks metadata, but does not control the data lifecycle.
✔️Ideal for multi-platform access or when organizations prefer to manage storage independently.
❗Challenges: Lacks full governance, lifecycle control, and performance optimizations offered by Databricks-managed storage.
🔹Managed Tables
✔️Data is fully managed by Databricks, stored within its managed storage.
✔️Unity Catalog controls both metadata and the physical data, ensuring strong governance, security, and lineage tracking.
✔️Best suited for AI/ML workloads, compliance-driven use cases, and automated data lifecycle management.
❗Considerations: Requires migrating data into Databricks-managed storage, impacting existing workflows.
Which approach works best for your use case? Let’s discuss the trade-offs and strategies for seamless Unity Catalog migration!
- Labels:
-
Databricks Unity Catalog
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
There are two use cases where it's worth using external tables:
- Bronze Layer- when you use an external tool to ingest data into tables using file system.
- Integration with external services that aren't able to integrate with UC and they need to read files from storage.
In other cases it's better to use manged tables, especially when you want to automate governance on them such as Liquid Clustering.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi MariuszK,
I appreciate your note. We had a discussion with a few internal Databricks architects as well as a Databricks architect. Based on their recommendations, tables that are frequently accessed—such as Gold layer tables for reporting, tables used by ML jobs, and real-time streaming tables—should be created as managed tables. This approach ensures better performance, optimization, and enhanced governance and security controls, including support for serverless jobs. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hey!
I hope I’m not too late, and I’d like to share my opinion. While it’s true that managed services offer certain advantages over external tables, you should keep in mind that Databricks services often come with an associated cost, such as Predictive Optimization. I recommend reviewing your workflow and checking the associated costs here: Databricks Pricing.
It’s important to note that Databricks operates on a pay-as-you-go model, but in most cases, having control over the service and being able to manage resources through your cloud provider—for example, horizontal autoscaling, cluster size adjustments, etc.—often results in a lower overall bill. I recommend conducting a cost analysis to determine which processes could benefit from migrating to managed services and which ones might not be worth it.
In general, managed services provide better performance, optimization, and enhanced governance and security controls, including support for serverless jobs, but everything comes at a much higher cost.
🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Thank you for sharing your insights! You make a great point about the cost considerations associated with managed services in Databricks. While managed tables offer advantages in terms of performance, optimization, governance, and security, it’s always important to evaluate cost implications based on specific workloads.
A cost-benefit analysis can help determine which processes truly benefit from managed services versus those that can be optimized through cloud provider resource management (e.g., horizontal autoscaling, cluster size adjustments). We’ll take your feedback into account and ensure the right balance between cost efficiency and operational benefits.
Appreciate your input!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
4 weeks ago
There are two use cases where it's worth using external tables:
- Bronze Layer- when you use an external tool to ingest data into tables using file system.
- Integration with external services that aren't able to integrate with UC and they need to read files from storage.
In other cases it's better to use manged tables, especially when you want to automate governance on them such as Liquid Clustering.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hi MariuszK,
I appreciate your note. We had a discussion with a few internal Databricks architects as well as a Databricks architect. Based on their recommendations, tables that are frequently accessed—such as Gold layer tables for reporting, tables used by ML jobs, and real-time streaming tables—should be created as managed tables. This approach ensures better performance, optimization, and enhanced governance and security controls, including support for serverless jobs. Thanks.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Hey!
I hope I’m not too late, and I’d like to share my opinion. While it’s true that managed services offer certain advantages over external tables, you should keep in mind that Databricks services often come with an associated cost, such as Predictive Optimization. I recommend reviewing your workflow and checking the associated costs here: Databricks Pricing.
It’s important to note that Databricks operates on a pay-as-you-go model, but in most cases, having control over the service and being able to manage resources through your cloud provider—for example, horizontal autoscaling, cluster size adjustments, etc.—often results in a lower overall bill. I recommend conducting a cost analysis to determine which processes could benefit from migrating to managed services and which ones might not be worth it.
In general, managed services provide better performance, optimization, and enhanced governance and security controls, including support for serverless jobs, but everything comes at a much higher cost.
🙂
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago
Thank you for sharing your insights! You make a great point about the cost considerations associated with managed services in Databricks. While managed tables offer advantages in terms of performance, optimization, governance, and security, it’s always important to evaluate cost implications based on specific workloads.
A cost-benefit analysis can help determine which processes truly benefit from managed services versus those that can be optimized through cloud provider resource management (e.g., horizontal autoscaling, cluster size adjustments). We’ll take your feedback into account and ensure the right balance between cost efficiency and operational benefits.
Appreciate your input!

