cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
MohanaBasak
Databricks Employee
Databricks Employee

Iceberg Interoperability (Databricks <> Snowflake)

In this post, we’ll explore the two main approaches to integrating UC with other Iceberg environments — Foreign Iceberg tables and Managed Iceberg tables — and walk through when to choose each, how to configure them, and the operational considerations to keep in mind.

Seamless interoperability between Databricks Unity Catalog (UC) and external Apache Iceberg ecosystems, such as Snowflake Horizon, enables organizations to use the right engine for the right workload without being locked into a single technology stack. By standardizing on Iceberg and establishing the right catalog strategy, teams can achieve governed, engine-agnostic data sharing while avoiding costly and unnecessary data duplication. 

Screenshot 2025-08-14 at 5.43.13 PM.png

 

Understanding Apache Iceberg and Catalogs

Apache Iceberg is an open table format for large-scale data lakes that supports ACID transactions, schema evolution, and branching. However, Iceberg requires a catalog to track table metadata, coordinate transactions, and enforce governance.

A catalog is responsible for:

  • Discovery – knowing where tables live and what metadata they reference.
  • Concurrency control – managing atomic snapshot updates across engines.
  • Governance – controlling permissions, lineage, and auditing.
  • Optimization hooks – triggering compaction, clustering, and snapshot expiration.

In a multi-engine architecture, the choice of catalog becomes just as important as the table format itself. The long-term vision for an open lakehouse is to decouple compute from storage so you can choose the best engine for the workload—while letting the catalog define governance, lifecycle, and optimization.

  • Managed Tables (UC-owned): Unity Catalog owns the table’s lifecycle — handling reads, writes, and automated maintenance such as Predictive Optimization.
  • Foreign Tables (Externally-owned): An external catalog, such as Snowflake Horizon, manages the lifecycle. Unity Catalog provides secure, governed read access without taking ownership of the table.

Choosing the Right Approach — Quick Reference

Requirement

Pick

Only read in Databricks; lifecycle owned elsewhere

Foreign Iceberg (Catalog Federation)

Full read/write from Databricks and other engines, with UC governance & automation

Managed Iceberg + UC REST Catalog

Need automated maintenance (optimize, expire snapshots)

Managed Iceberg + Predictive Optimization

Need to evolve clustering strategy without rewrite

Managed Iceberg + Liquid Clustering

 

Two Approaches to Iceberg Interoperability in UC

1. Foreign Iceberg Tables (read-only from Databricks)

Foreign Iceberg tables are owned and maintained by an external catalog — for example, Snowflake Horizon or AWS Glue — but made visible in Unity Catalog. This approach is ideal when Databricks needs read-only access to tables whose lifecycle is managed elsewhere.

The most robust way to access foreign Iceberg is Catalog Federation, where UC connects directly to the external catalog and mirrors table metadata. Each query checks for the latest snapshot from the foreign catalog before execution, ensuring freshness without losing UC governance, lineage tracking, and audit visibility.

Configuring Foreign Iceberg in UC

Step 1. Create External Location(s) (authorized paths to the Iceberg data)

  • Lets UC read the cloud storage referenced by the foreign catalog.

Step 2. Create a Connection to the Foreign Catalog

  • E.g., Snowflake Horizon: connection must have USAGE on database, schema, external volume, and the Iceberg table(s) to resolve current metadata locations.

Step 3. Create a Federated Catalog in UC

  • Point it at the connection + authorize it to the external locations from Step 1.
  • Every query checks for metadata freshness before proceeding.
SELECT * FROM federated_catalog.federated_schema.iceberg_table

 

2. Managed Iceberg Tables (full interoperability, read/write)

Managed Iceberg tables are created and governed by Unity Catalog. UC is the system of record, and all engines — including Snowflake, Trino, Flink, or Spark — access the tables through the UC Iceberg REST Catalog. This setup supports full read/write interoperability while preserving UC’s centralized governance.

Managed Iceberg offers:

  • Single source of truth for data and metadata.
  • Predictive Optimization to automatically handle compaction, snapshot expiration, and other maintenance tasks.
  • Liquid Clustering for evolving data layout strategies without rewriting historical data.

Configuring Managed Iceberg in UC

Step 1. Create the table

CREATE OR REPLACE TABLE main.schema.iceberg_table (c1 INT)
USING iceberg;

(or)

df.write.format("iceberg").saveAsTable("main.schema.iceberg_table")

If you try to specify a LOCATION on a managed Iceberg table, Databricks will error — UC manages the storage.

Step 2 (Optional but recommended) Enable Liquid Clustering

ALTER TABLE main.schema.iceberg_table
CLUSTER BY (c1);

Note: Currently, to make CLUSTER BY work on Iceberg tables in UC, you must first set the following table properties:

ALTER TABLE main.schema.iceberg_table
SET TBLPROPERTIES (
  'delta.enableDeletionVectors' = false,
  'delta.enableRowTracking' = false
);

This extra step will not be required in the near future — with Iceberg v3 you’ll be able to simply run CLUSTER BY without the table property configuration.

Step 3. Use standard DML

INSERT INTO main.schema.iceberg_table VALUES (11);

Step 4. Enable external data access (metastore level)

  • UC vends short-lived credentials so external engines can access the underlying storage.

Screenshot 2025-08-14 at 12.00.53 PM.png

Step 5. Grant EXTERNAL USE SCHEMA (it’s not in ALL PRIVILEGES)

  • Grant at catalog or schema level to the service principal / user.

Screenshot 2025-08-14 at 11.59.06 AM.png

Step 6. Configure the Iceberg REST Catalog client

Auth Option A — Service Principal

CREATE OR REPLACE CATALOG INTEGRATION unity_catalog
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = '<uc-schema name>'
  REST_CONFIG = (
    CATALOG_URI = 'https://<workspace-uri>/api/2.1/unity-catalog/iceberg-rest'
    WAREHOUSE = '<uc-catalog name>'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
  )
  REST_AUTHENTICATION = (
    TYPE = OAUTH
    OAUTH_TOKEN_URI = 'https://<workspace-uri>/oidc/v1/token'
    OAUTH_CLIENT_ID = '<client-id>'
    OAUTH_CLIENT_SECRET = '<client-secret>'
    OAUTH_ALLOWED_SCOPES = ('all-apis')
  )
  ENABLED = TRUE;

Auth Option B — Personal Access Token

CREATE OR REPLACE CATALOG INTEGRATION unity_catalog
  CATALOG_SOURCE = ICEBERG_REST
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = '<uc-schema-name>'
  REST_CONFIG = (
    CATALOG_URI = 'https://<workspace-uri>/api/2.1/unity-catalog/iceberg-rest'
    WAREHOUSE = '<uc-catalog-name>'
    ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
  )
  REST_AUTHENTICATION = (
    TYPE = BEARER
    BEARER_TOKEN = '<personal access token>'
  )
  ENABLED = TRUE;

ℹ️ Additional note for Snowflake writes

  • Vended credentials currently work for reads only from Snowflake into UC-managed Iceberg.
  • To write to a UC-managed Iceberg table from Snowflake:
    1. Create an external location on the Snowflake side pointing to the UC table’s underlying storage.
    2. Create the catalog integration in Snowflake the same way as for reads, but omit ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS.
    3. When creating the Iceberg table in Snowflake, explicitly specify the external location.

Common Pitfalls to Avoid

  • Missing authorized paths in federated catalogs will cause queries to fall back to query federation, potentially incurring double compute costs.
  • Forgetting EXTERNAL USE SCHEMA when granting external engines access will block reads/writes. It’s not implied by ALL PRIVILEGES.
  • Not setting a root storage location for a federated catalog in UC; this is required to persist UC’s metadata about the foreign catalog. This is separate from authorized paths.
  • Version requirements: DBR 15.4 LTS+ for reads, 16.4 LTS+ for writes.
  • Not enabling Predictive Optimization can lead to performance degradation over time.
     

Conclusion

Apache Iceberg’s open, engine-agnostic design enables true cross-platform interoperability—but only if paired with the right catalog strategy. By deciding early whether a dataset should be treated as Foreign Iceberg or Managed Iceberg, and by following best practices for federation, REST Catalog integration, and governance, you can build a lakehouse architecture that supports multiple engines without sacrificing performance, security, or maintainability.