cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Change AWS S3 storage class for subset of schema

Newbienewbster
New Contributor II

I have a schema that has grown very large. There are mainly two types of tables in it. One of those types accounts for roughly 80% of the storage. Is there a way to somehow set a policy for those tables only to transfer them to a different storage class? Unity catalog obfuscates the underlying table location (otherwise I could simply use a prefix in the AWS storage policy...)

Any advice would be most appreciated 🙂

1 REPLY 1

mark_ott
Databricks Employee
Databricks Employee

Yes, it's possible to manage storage costs in Databricks and Unity Catalog by targeting specific tables for different storage classes, but Unity Catalog does add complexity since it abstracts the direct S3 (or ADLS/GCS) object paths from you. Here’s a comprehensive approach to address the scenario you described:

Direct S3 Path Workarounds

  • Unity Catalog Abstraction: Unity Catalog manages S3 locations internally for managed tables, so you don’t have direct control or visibility over prefixes per table to write a fine-grained AWS S3 Lifecycle Policy. This makes table-type-specific lifecycle management tricky if you rely solely on S3 object-level policies.

  • External (Unmanaged) Tables: If you register external tables (using CREATE TABLE ... LOCATION 's3://...'), you can specify unique S3 paths or prefixes for each type of table. Then, you can safely apply AWS S3 lifecycle policies to the prefix used by the "heavyweight" tables.

Possible Solutions in Unity Catalog

1. Partitioning by External Table Location

  • For new large tables, use external tables in a separate S3 prefix.

  • Apply a lifecycle policy targeting that specific prefix.

2. Table Tagging and Automation

  • Use Unity Catalog table tags (“table properties”) to annotate these table types.

  • Write an automated Databricks job (using Python/Scala/Shell notebooks or Databricks workflows) that:

    • Enumerates tables with the matching tag or property.

    • Resolves their actual storage URIs (this is possible through some API calls or logs but isn’t officially guaranteed; future Unity Catalog features may make this easier).

    • Moves their underlying data to the desired S3 prefix or storage class, then updates the table metadata.

  • This approach is advanced and requires caution to avoid data consistency issues.

3. Contacting Databricks Support

  • Databricks is aware of the need for finer-grained storage policies in Unity Catalog-managed environments.

  • There may be preview features or recommended best practices not yet widely documented.

  • Request guidance (and file a feature request if needed) for granular storage lifecycle control per table type.

Key Points and Limitations

  • No Native Table-level Policy: Unity Catalog currently lacks a built-in way to apply different S3 lifecycle rules per table type for managed tables.

  • External Table Best Practice: If you need this flexibility now, use external tables with clearly separated S3 paths.

  • Manual Table Management: For existing managed tables, it’s not possible today to retroactively assign different storage classes without moving them out of Unity Catalog management and re-registering as external tables.

References

  • : AWS S3 Lifecycle Policies and Databricks Table Management

  • : Databricks Documentation on External Tables with Unity Catalog

  • : Community Discussions and Feature Requests: Per-table Storage Classes


In summary:
For table-level storage class control in Unity Catalog, use external tables and distinct S3 prefixes for the heavy-storage table types. Then, apply storage class or lifecycle policies in S3 to those prefixes. For fully managed tables, current Unity Catalog abstractions prevent this, so consider feature requests or consult Databricks for roadmap or workaround options.