In this post, we’ll explore the two main approaches to integrating UC with other Iceberg environments — Foreign Iceberg tables and Managed Iceberg tables — and walk through when to choose each, how to configure them, and the operational considerations to keep in mind.
Seamless interoperability between Databricks Unity Catalog (UC) and external Apache Iceberg ecosystems, such as Snowflake Horizon, enables organizations to use the right engine for the right workload without being locked into a single technology stack. By standardizing on Iceberg and establishing the right catalog strategy, teams can achieve governed, engine-agnostic data sharing while avoiding costly and unnecessary data duplication.
Apache Iceberg is an open table format for large-scale data lakes that supports ACID transactions, schema evolution, and branching. However, Iceberg requires a catalog to track table metadata, coordinate transactions, and enforce governance.
A catalog is responsible for:
In a multi-engine architecture, the choice of catalog becomes just as important as the table format itself. The long-term vision for an open lakehouse is to decouple compute from storage so you can choose the best engine for the workload—while letting the catalog define governance, lifecycle, and optimization.
Requirement |
Pick |
---|---|
Only read in Databricks; lifecycle owned elsewhere |
Foreign Iceberg (Catalog Federation) |
Full read/write from Databricks and other engines, with UC governance & automation |
✅ Managed Iceberg + UC REST Catalog |
Need automated maintenance (optimize, expire snapshots) |
✅ Managed Iceberg + Predictive Optimization |
Need to evolve clustering strategy without rewrite |
✅ Managed Iceberg + Liquid Clustering |
Foreign Iceberg tables are owned and maintained by an external catalog — for example, Snowflake Horizon or AWS Glue — but made visible in Unity Catalog. This approach is ideal when Databricks needs read-only access to tables whose lifecycle is managed elsewhere.
The most robust way to access foreign Iceberg is Catalog Federation, where UC connects directly to the external catalog and mirrors table metadata. Each query checks for the latest snapshot from the foreign catalog before execution, ensuring freshness without losing UC governance, lineage tracking, and audit visibility.
Step 1. Create External Location(s) (authorized paths to the Iceberg data)
Step 2. Create a Connection to the Foreign Catalog
Step 3. Create a Federated Catalog in UC
SELECT * FROM federated_catalog.federated_schema.iceberg_table
Managed Iceberg tables are created and governed by Unity Catalog. UC is the system of record, and all engines — including Snowflake, Trino, Flink, or Spark — access the tables through the UC Iceberg REST Catalog. This setup supports full read/write interoperability while preserving UC’s centralized governance.
Managed Iceberg offers:
Step 1. Create the table
CREATE OR REPLACE TABLE main.schema.iceberg_table (c1 INT)
USING iceberg;
(or)
df.write.format("iceberg").saveAsTable("main.schema.iceberg_table")
If you try to specify a LOCATION on a managed Iceberg table, Databricks will error — UC manages the storage.
Step 2 (Optional but recommended) Enable Liquid Clustering
ALTER TABLE main.schema.iceberg_table
CLUSTER BY (c1);
Note: Currently, to make CLUSTER BY work on Iceberg tables in UC, you must first set the following table properties:
ALTER TABLE main.schema.iceberg_table
SET TBLPROPERTIES (
'delta.enableDeletionVectors' = false,
'delta.enableRowTracking' = false
);
This extra step will not be required in the near future — with Iceberg v3 you’ll be able to simply run CLUSTER BY without the table property configuration.
Step 3. Use standard DML
INSERT INTO main.schema.iceberg_table VALUES (11);
Step 4. Enable external data access (metastore level)
Step 5. Grant EXTERNAL USE SCHEMA (it’s not in ALL PRIVILEGES)
Step 6. Configure the Iceberg REST Catalog client
Auth Option A — Service Principal
CREATE OR REPLACE CATALOG INTEGRATION unity_catalog
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = '<uc-schema name>'
REST_CONFIG = (
CATALOG_URI = 'https://<workspace-uri>/api/2.1/unity-catalog/iceberg-rest'
WAREHOUSE = '<uc-catalog name>'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_TOKEN_URI = 'https://<workspace-uri>/oidc/v1/token'
OAUTH_CLIENT_ID = '<client-id>'
OAUTH_CLIENT_SECRET = '<client-secret>'
OAUTH_ALLOWED_SCOPES = ('all-apis')
)
ENABLED = TRUE;
Auth Option B — Personal Access Token
CREATE OR REPLACE CATALOG INTEGRATION unity_catalog
CATALOG_SOURCE = ICEBERG_REST
TABLE_FORMAT = ICEBERG
CATALOG_NAMESPACE = '<uc-schema-name>'
REST_CONFIG = (
CATALOG_URI = 'https://<workspace-uri>/api/2.1/unity-catalog/iceberg-rest'
WAREHOUSE = '<uc-catalog-name>'
ACCESS_DELEGATION_MODE = VENDED_CREDENTIALS
)
REST_AUTHENTICATION = (
TYPE = BEARER
BEARER_TOKEN = '<personal access token>'
)
ENABLED = TRUE;
ℹ️ Additional note for Snowflake writes
Apache Iceberg’s open, engine-agnostic design enables true cross-platform interoperability—but only if paired with the right catalog strategy. By deciding early whether a dataset should be treated as Foreign Iceberg or Managed Iceberg, and by following best practices for federation, REST Catalog integration, and governance, you can build a lakehouse architecture that supports multiple engines without sacrificing performance, security, or maintainability.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.