cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Foreign catalog to Snowflake

emanueol
New Contributor III

While learning about Databricks foreign catalogsn (I'm on free tier DBX account), seems theres 2 ways creating foreign catalog to Snowflake:

  1. via CONNECTION type=snowflake, seems jdbc connection through where DBX pulls all metadata of 1 snowflake database. Additionally, by providing external location/paths on the CATALOG Basics DBX can directly access Snowflake Iceberg tables, falling back to jdbc if any issue with Iceberg tables, and jdbc default for all non-iceberg tables in Snowflake, and I think got this right (was able setting up this CONNECTION and FOREIGN CATALOG via DVX webUI.

Now, to create foreign catalog via Snowflake Horizon REST Iceberg API, I wonder how to achieve this:

  • Is it via a new type of entry type on new CONNECTION (something like Snowflake Horizon Rest Iceberg) ?
  • or is it same new CONNECTION type = "Snowflake", but then some sub option to distinguish between JDBC (query formation) vs Catalog Federation?

I need help, because as i don't see any "Snowflake Horizon" option or suboption anywhere in DBX webUI, Im wondering maybe trial/free DBX accounts CAN'T create foreign catalog against Snowflake Horizon REST Iceberg API ?

Thanks so much ~

1 ACCEPTED SOLUTION

Accepted Solutions

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @emanueol,

Your follow-up question is clear, and it is a good distinction to make. Let me address it directly.

SHORT ANSWER

Yes, when you set up Snowflake Catalog Federation in Databricks, Databricks does use the Snowflake Horizon REST Iceberg catalog API under the hood to discover and resolve Iceberg table metadata. There is no separate "Snowflake Horizon REST Iceberg" connection type you need to create. It is all handled through the same Snowflake connection type, with the catalog federation configuration enabling the Iceberg REST catalog behavior automatically.

HOW IT WORKS UNDER THE HOOD

When you create a Snowflake connection and then create a foreign catalog with a storage location (catalog federation), here is what happens at query time:

1. Databricks uses the connection credentials to communicate with Snowflake Horizon Catalog to verify that the target table is an Iceberg table and to request the latest metadata.json path and table location.

2. If the table qualifies as an Iceberg table (compatible URI scheme like s3, abfs, gs, etc., and metadata is within the table location), Databricks reads the data directly from object storage using Databricks compute only. Snowflake compute is not involved in the query execution.

3. If the table does not qualify for direct Iceberg access (unsupported URI scheme, special characters, metadata outside the table directory, or it is not an Iceberg table at all), Databricks falls back to JDBC-based query federation, where the query is pushed down to Snowflake compute.

So to directly answer your question: you do not need a separate connection type for Snowflake's REST Iceberg API. The standard Snowflake connection type, combined with catalog federation setup (which includes specifying external locations and storage credentials), already leverages the Snowflake Horizon catalog endpoint for Iceberg metadata resolution.

WHY YOU DO NOT SEE A SEPARATE OPTION IN THE UI

There is no separate "Snowflake Horizon REST Iceberg" connection type because the architecture is designed so that a single Snowflake connection serves both purposes:

- Query federation (JDBC): for non-Iceberg tables and fallback scenarios
- Catalog federation (Iceberg via Horizon): for direct object storage access to Iceberg tables

The difference is in how you configure the foreign catalog, not the connection itself. When you add a storage location to the catalog configuration, you are enabling the catalog federation path that uses the Horizon REST catalog for metadata discovery.

SETUP SUMMARY

1. Create a Snowflake connection (same connection type for both approaches)
2. Create a storage credential for the cloud storage where Snowflake Iceberg tables reside
3. Create an external location for the relevant storage paths
4. Create a foreign catalog specifying the connection AND a metadata storage location

The key step that enables catalog federation (and thus the Horizon REST catalog path) is step 4, where you provide the storage location on the foreign catalog. Without that, you get pure JDBC query federation.

FREE TIER CONSIDERATIONS

Catalog federation requires specific infrastructure:
- Databricks Runtime 13.3 LTS or above
- Standard or Dedicated access mode clusters, or Pro/Serverless SQL warehouses
- Network connectivity from Databricks to both Snowflake and the cloud storage where Iceberg data resides
- Proper Unity Catalog permissions (CREATE CONNECTION, CREATE CATALOG, plus storage credential and external location setup)

If your free tier workspace supports these requirements, you should be able to configure catalog federation. If you are not seeing the option to specify a storage location when creating the foreign catalog, it may be a tier limitation or a UI path issue. Try following the catalog federation specific documentation here:
https://docs.databricks.com/en/query-federation/snowflake-catalog-federation.html

And the general connection setup here:
https://docs.databricks.com/en/query-federation/snowflake.html

Hope this clears up the distinction. The short version: Databricks already supports accessing Snowflake Iceberg tables via the Horizon catalog endpoint, and it is built into the existing Snowflake catalog federation workflow, not a separate connection type.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

View solution in original post

10 REPLIES 10

MoJaMa
Databricks Employee
Databricks Employee

Good question. I think there is some general confusion on Lakehouse Federation (JDBC) and Catalog Federation (Object Storage Access). 

Step 1. Create Connection.

https://docs.databricks.com/aws/en/query-federation/database-federation#create-a-connection

This enables Lakehouse Federation but also is a pre-req for Catalog Federation.

Step 2. Catalog Federation: 

https://docs.databricks.com/aws/en/query-federation/snowflake-catalog-federation#set-up-catalog-fede...

"The general steps for setting up catalog federation are similar to setting up query federation: create a connection to Snowflake first, then create and configure a foreign catalog. However, you must also create a storage credential and an external location for the paths to the Apache Iceberg tables registered in Snowflake."

emanueol
New Contributor III

Yes, I understood that creation of CONNECTION type= "Snowflake":

  • snowflake acc url is enough for JDBC, and CATALOG BASICS provide the EXTERNAL LOCATION & paths (Iceberg tables).

My understanding is (please confirm):

  1. DBX pulls via JDBC all Snowflake schemas, tables of specified database.
  2. upon access: if its Iceberg table try direct storage access, fallback to jdbc if something wrong with metadata/storage access/definition.
  3. For all non-iceberg tables --> use JDBC.

But how about the 3rd type of access - type "Snowflake Horizon" REST Iceberg API - access to only Snowflake Iceberg tables):

  • is this a new type of CONNECTION that I don't see because I'm on DBX  free tier ?
  • Or connection just same "Snowflake", meaning despite DBX user never sees "Snowflake Horizon rest catalog", implicitly DBX will indeed try connect to Snowflake end point on above step 2. instead of just using jdbc to pull info on snowflake iceberg tables, does DBX just appends "/iceberg" to the only physical supplied host field ~ snowflake account provided in the connection resulting on: https://<account_identifier>.snowflakecomputing.com/iceberg

Hope its clear my doubt now? Thanks in advance ~ 

MoJaMa
Databricks Employee
Databricks Employee

If you see the text in the second link I sent, you'll see this:

"Create a connection to Snowflake Horizon Catalog and create a foreign catalog using one of the authentication methods listed above. You must specify a location in cloud storage where metadata will be stored for Iceberg tables in this catalog."

ie, there's no 3rd mechanism needed.

And yes, your general understanding is correct.

To summarize that specifically:

 

  • When a table qualifies (Iceberg with supported URI/scheme), Databricks reads directly from object storage; otherwise it automatically falls back to JDBC (query federation).
  • Only specific schemes are supported (s3, abfs, gs, r2, wasb/wasbs, etc.); special characters or metadata outside the table location block direct access, triggering fallback.
  • Writing/reading via JDBC to Iceberg in a catalogโ€‘federated catalog is disabled once a storage location is configured on the catalog; nonโ€‘Iceberg tables are unaffected.

 

emanueol
New Contributor III

Thanks for your feedback, but I still don't have the insight Im trying ton<understand, sorry if it's on my side the way I expressed.

Horizon in Snowflake catalog.
BUT.. Recently Snowflake opened up a new REST ICEBERG API endpoint that that gives access to Snowflake Iceberg tables (it's comply to the standard iceberg rest api; similar to Unity Catalog REST ICEBERG API endpoint.
Snowflake SQL client drivers (JDBC, ODBC, etc) communicate with another  Snowflake REST SQL API underneath.
Meaning one can refer to Snowflake Horizon as just 1 thing, but in fact theres 2 endpoints: JDBC via SQL REST API, and the new endpoint ICEBERG REST API.

I been trying to understand in the context of acessing Snowflake managed Iceberg tables, if Databricks already supported the recent Snowflake REST ICEBERG API or not?

The future is multi compute sharing Iceberg tables via the standsrd REST ICEBERG API that both Unit Catalog and Snowflake Horizon exposes. I know how Snowflake can federate READ/WRITE with vended credentials connecting to UC rest iceberg api.

So just trying understand if the other way also possible:

Can Databricks connect to Snowflake Horizon via Snowflake REST ICEBERG API. (not JDBC) ?
Hope this clarifies, because documentation referring to Snowflake Horizon, but doesn't share if physical API only JDBC, orif DBX already supports Snowflake Horizon acccess via new REST ICEBERG API endpoint.

Hope this helps, maybe Databricks needs update documentation, but for now just trying confirm if DBX can or not uses the advanced rest iceberg.

Thanks in advance ~

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @emanueol,

Your follow-up question is clear, and it is a good distinction to make. Let me address it directly.

SHORT ANSWER

Yes, when you set up Snowflake Catalog Federation in Databricks, Databricks does use the Snowflake Horizon REST Iceberg catalog API under the hood to discover and resolve Iceberg table metadata. There is no separate "Snowflake Horizon REST Iceberg" connection type you need to create. It is all handled through the same Snowflake connection type, with the catalog federation configuration enabling the Iceberg REST catalog behavior automatically.

HOW IT WORKS UNDER THE HOOD

When you create a Snowflake connection and then create a foreign catalog with a storage location (catalog federation), here is what happens at query time:

1. Databricks uses the connection credentials to communicate with Snowflake Horizon Catalog to verify that the target table is an Iceberg table and to request the latest metadata.json path and table location.

2. If the table qualifies as an Iceberg table (compatible URI scheme like s3, abfs, gs, etc., and metadata is within the table location), Databricks reads the data directly from object storage using Databricks compute only. Snowflake compute is not involved in the query execution.

3. If the table does not qualify for direct Iceberg access (unsupported URI scheme, special characters, metadata outside the table directory, or it is not an Iceberg table at all), Databricks falls back to JDBC-based query federation, where the query is pushed down to Snowflake compute.

So to directly answer your question: you do not need a separate connection type for Snowflake's REST Iceberg API. The standard Snowflake connection type, combined with catalog federation setup (which includes specifying external locations and storage credentials), already leverages the Snowflake Horizon catalog endpoint for Iceberg metadata resolution.

WHY YOU DO NOT SEE A SEPARATE OPTION IN THE UI

There is no separate "Snowflake Horizon REST Iceberg" connection type because the architecture is designed so that a single Snowflake connection serves both purposes:

- Query federation (JDBC): for non-Iceberg tables and fallback scenarios
- Catalog federation (Iceberg via Horizon): for direct object storage access to Iceberg tables

The difference is in how you configure the foreign catalog, not the connection itself. When you add a storage location to the catalog configuration, you are enabling the catalog federation path that uses the Horizon REST catalog for metadata discovery.

SETUP SUMMARY

1. Create a Snowflake connection (same connection type for both approaches)
2. Create a storage credential for the cloud storage where Snowflake Iceberg tables reside
3. Create an external location for the relevant storage paths
4. Create a foreign catalog specifying the connection AND a metadata storage location

The key step that enables catalog federation (and thus the Horizon REST catalog path) is step 4, where you provide the storage location on the foreign catalog. Without that, you get pure JDBC query federation.

FREE TIER CONSIDERATIONS

Catalog federation requires specific infrastructure:
- Databricks Runtime 13.3 LTS or above
- Standard or Dedicated access mode clusters, or Pro/Serverless SQL warehouses
- Network connectivity from Databricks to both Snowflake and the cloud storage where Iceberg data resides
- Proper Unity Catalog permissions (CREATE CONNECTION, CREATE CATALOG, plus storage credential and external location setup)

If your free tier workspace supports these requirements, you should be able to configure catalog federation. If you are not seeing the option to specify a storage location when creating the foreign catalog, it may be a tier limitation or a UI path issue. Try following the catalog federation specific documentation here:
https://docs.databricks.com/en/query-federation/snowflake-catalog-federation.html

And the general connection setup here:
https://docs.databricks.com/en/query-federation/snowflake.html

Hope this clears up the distinction. The short version: Databricks already supports accessing Snowflake Iceberg tables via the Horizon catalog endpoint, and it is built into the existing Snowflake catalog federation workflow, not a separate connection type.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

Thanks @SteveOstrowski exactly the detail I was expecting and kinda suspected but wanted to besure, and surely will help both communities ๐Ÿ˜Š.

Steve's answer is incorrect about Databricks using Iceberg Rest APIs. Catalog Federation is simply a workaround & marketing term for NOT supporting Databricks UNITY from accessing Iceberg Rest Catalog APIs to read or write from external IRC Catalogs.  

Steve, your answer incorrect. Databricks does NOT use Horizon/Polaris or any other IRC rest APIs to talk to any external system.  Access to any external Iceberg Catalog like Glue, Google. Horizon, Polaris & etc. is blocked by UNITY either to read or write data.

This is exactly why you have to provide a JDBC connection string for the federated catalogs. Databricks needs to run SQL queries for each table to figure out the schema, columns but more importantly where the actual iceberg files are located.

Because UNITY won't allow IRC Rest Catalog APIs to other catalogs, running bunch of DESC TABLE or SHOW COLUMNS type SQL queries for metadata using Provider's compute (Snowflake compute in the case of Horizon) for each query is both a bottle neck for performance, adds cost to provider but more importantly does not provide the temp vended creds to access the table locations as IRC APIs would.

Because it has no way of fetching vended creds via SQL, it required provider (Snowflake admin) to grant permanent full read/write access to Databricks for their main iceberg table storage locations (Unsecure, major compliancy issues) as Databricks owner can mistakenly blow up any file or folder with that kind of access w/o any knowledge of catalog owner (Snowflake).

It still needs WRITE access even if it needs to only read the table because Databricks compute (mainly Photon) does not currently have ability to directly read Iceberg tables. It will generate Delta metadata for each IB table and use that to query instead of reading Iceberg directly.

If it did use IRC APIs for external catalogs for read & write ops like Glue or Snowflake can, it would :

  1. Not need a JDBC Connection (APIs are Driverless.Only login creds would be enough to talk to IRC Apis)
  2. Not need SNOWFLAKE_WAREHOUSE for that connection as IRC Rest APIs do not use any provider compute clusters to run queries.
  3. Not need permanent read/write access to root storage locations for all iceberg table because IRC dynamically sends Temp Vended Creds for compliant engines to access the files.

 

 

 

 

SteveOstrowski
Databricks Employee
Databricks Employee

Hi,

Appreciate the detailed pushback. A few of the technical claims here are mixing up two distinct Databricks mechanisms, so let me separate them, because the answer changes depending on which one you mean.

1. Snowflake catalog federation does not run on Snowflake compute and does not require a warehouse for Iceberg tables. For Snowflake-managed Iceberg tables, Unity Catalog reads the Iceberg table directly from object storage on Databricks compute. Per the docs, "Databricks Runtime checks the returned metadata and reads the Iceberg table directly from the object storage path" and "the query is only executed using Databricks compute." No Snowflake warehouse is used for those tables. The JDBC connection and Snowflake compute only come into play as a fallback for tables that are not eligible Iceberg tables (standard query federation), which is a different path.

2. It reads Iceberg natively rather than converting to Delta. The federation path "directly accesses the Snowflake Managed Iceberg table in object storage" in native Iceberg format. There is no Delta-metadata-generation step required to read these tables.

3. The access requirement is read, not permanent read/write. Reading a federated Snowflake Iceberg table requires SELECT plus read access to the underlying storage path via a Unity Catalog external location and storage credential. It does not require granting Databricks permanent read/write on the provider's root storage to read a table.

4. On the IRC REST API point specifically: Unity Catalog does implement an Apache Iceberg REST Catalog server, and it supports credential vending to external engines. Per the credential-vending docs, "Credential vending supports external systems that connect to Unity Catalog using the Unity REST API and Apache Iceberg REST catalog," issuing short-lived, scoped credentials. So the blanket statement that Databricks does not use IRC REST APIs at all is not accurate. What is true is that the consumption direction and per-source support vary by table type (for example, foreign Iceberg tables surfaced through federation are read-only).

Net: catalog federation to Snowflake is direct-storage, Databricks-compute, read-oriented for Iceberg tables, and Unity Catalog does speak the Iceberg REST Catalog protocol on the serving side. Happy to go deeper on any one of these.

Sources:

 

You may also find this recent announcement relevant if you're in the Iceberg space:
https://www.databricks.com/blog/apache-icebergtm-v3-moving-ecosystem-towards-unification

 

Hi Steve,

Again, it is not entirely accurate.

1. Databricks Snowflake Warehouses for Non-Iceberg Tables however for Iceberg Tables, it will use Snowflake Compute in the Global Services (GS) layer to run SQL metadata queries (SHOW TABLE, SHOW COLUMNS DESCRIBE TABLE & etc.).  GS layer in Snowflake is shared compute that allows Snowflake customers to run many metadata queries for free w/o needing to start a paid warehouse. Unfortunately it is free compute ONLY if the usage on this layer stays below 10% of the actual warehouse compute spend. For 99.9% of the case, no-one runs abnormal amount of metadata queries where it exceed 10% of their compute so they never pay.  Problem with Databricks using this JDBX + SQL method to fetch metadata from Snowflake per each table & per each query means, Databricks is running exclusively metadata queries in the hundreds or many thousands (if these tables are queried often). Since warehouse is not used for these queries, their cost exceeds 10% of compute usage and customers are billed for all this compute. Databricks SEs and AEs tell Snowflake customers that these queries are free without knowing how the GS compute is billed. It is designed to be FREE when used as intended. When the Free Metadata options is abused and you run 10K DESC or SHOW table queries then that is billed at the warehouse rate. 

2. Unfortunately is does not. It was a little mistake on my part for where you need the WRITE access. For Federated Catalog, you need to set tup 2 storage location. 

Authorized Paths: This is the root storage bucket of all Iceberg tables from the provider which the provider has to grant read-only ACCESS. Again, because you don't use IRC APIs and not able to fetch temp vend creds, there is no way to easily lock down the sub folders for the tables DBX needs access by the provider. If they 1000 Iceberg tables but want to grant you 100 of them via the catalog, they either need to give read access to root for all tables (which is not secure) OR they somehow have to come up with complex IAM rules that grants access to only those 100 tables(that is a maintenance nightmare as IAM rules are separate than RBAC.  IF they grant you access to more tables or remove RBAC access to existing tables, provider than has to to their IAM rules and make sure they fix them each time to match the proper folders). Either way, this is not something anyone would use unless it is the same company, provider fully trusts the Databricks team with their files and no compliancy issues. 

Storage Location: This is where you need to define an additional cloud storage for Federated Catalog to be able to WRITE.  It needs this because Databricks Serverless & PHOTON engines & UNITY does not read ICEBERG table format directly. They can read the actual Data Parquet files but the IB metadata files. This is why it needs this access as it needs to create Delta version of the iceberg table metadata and that's what Databricks Engines use to query the tables directly.  

3. I explain the WRITE requirement. It is not for the IB storage location but can be any Storage bucket. It is required to read IB tables using Databricks compute for federated catalog.

4. This is where Databricks keeps giving the same unrelated wrong answer to this same question and creating confusion in the field and the customer base who manage data in external Iceberg Catalogs like Horizon, Glue & etc.. Reminder: The question was CAN DATABRICKS COMPUTE TALK TO OTHER EXTERNAL IRC CATALOGS USING IRC APIS & USE VENDED CREDS to access their Iceberg tables. 

It was NOT "Does Databricks uses IRC APIs and Vended Creds to Serve its own tables within UNITY?."  Databricks seems to purposefully give this misleading answer every time to avoid giving the correct answer. The answer is about Whether Databricks can to access other catalogs and not the other way around. The correct answer is Databricks currently HAS NO NATIVE SUPPORT FOR READING OR WRITING to external Iceberg tables using IRC API or by using Vended Creds to access the data files.

This is why it still needs JDBC connection to run SQL metadata queries to find what the table columns look like, where the tables files are stored and pre-approved permanent READ-ONLY access to read those files + another storage bucket with WRITE access to generate the Delta Matadata files for their compute to be able to query these tables.

If Databricks has support for "Their compute to read/write other external catalogs" using IRC APIs, then none of the above would be needed. No need for JDBC because you wouldn't run SQL "metadata" queries that cost Snowflake customers money as APIs give you all that info, Wouldn't need permanent read access IB files ahead of time as IRC APIs give temp vended creds to access the files.