cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Governance
Join discussions on data governance practices, compliance, and security within the Databricks Community. Exchange strategies and insights to ensure data integrity and regulatory compliance.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Foreign catalog to Snowflake

emanueol
New Contributor III

While learning about Databricks foreign catalogsn (I'm on free tier DBX account), seems theres 2 ways creating foreign catalog to Snowflake:

  1. via CONNECTION type=snowflake, seems jdbc connection through where DBX pulls all metadata of 1 snowflake database. Additionally, by providing external location/paths on the CATALOG Basics DBX can directly access Snowflake Iceberg tables, falling back to jdbc if any issue with Iceberg tables, and jdbc default for all non-iceberg tables in Snowflake, and I think got this right (was able setting up this CONNECTION and FOREIGN CATALOG via DVX webUI.

Now, to create foreign catalog via Snowflake Horizon REST Iceberg API, I wonder how to achieve this:

  • Is it via a new type of entry type on new CONNECTION (something like Snowflake Horizon Rest Iceberg) ?
  • or is it same new CONNECTION type = "Snowflake", but then some sub option to distinguish between JDBC (query formation) vs Catalog Federation?

I need help, because as i don't see any "Snowflake Horizon" option or suboption anywhere in DBX webUI, Im wondering maybe trial/free DBX accounts CAN'T create foreign catalog against Snowflake Horizon REST Iceberg API ?

Thanks so much ~

1 ACCEPTED SOLUTION

Accepted Solutions

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @emanueol,

Your follow-up question is clear, and it is a good distinction to make. Let me address it directly.

SHORT ANSWER

Yes, when you set up Snowflake Catalog Federation in Databricks, Databricks does use the Snowflake Horizon REST Iceberg catalog API under the hood to discover and resolve Iceberg table metadata. There is no separate "Snowflake Horizon REST Iceberg" connection type you need to create. It is all handled through the same Snowflake connection type, with the catalog federation configuration enabling the Iceberg REST catalog behavior automatically.

HOW IT WORKS UNDER THE HOOD

When you create a Snowflake connection and then create a foreign catalog with a storage location (catalog federation), here is what happens at query time:

1. Databricks uses the connection credentials to communicate with Snowflake Horizon Catalog to verify that the target table is an Iceberg table and to request the latest metadata.json path and table location.

2. If the table qualifies as an Iceberg table (compatible URI scheme like s3, abfs, gs, etc., and metadata is within the table location), Databricks reads the data directly from object storage using Databricks compute only. Snowflake compute is not involved in the query execution.

3. If the table does not qualify for direct Iceberg access (unsupported URI scheme, special characters, metadata outside the table directory, or it is not an Iceberg table at all), Databricks falls back to JDBC-based query federation, where the query is pushed down to Snowflake compute.

So to directly answer your question: you do not need a separate connection type for Snowflake's REST Iceberg API. The standard Snowflake connection type, combined with catalog federation setup (which includes specifying external locations and storage credentials), already leverages the Snowflake Horizon catalog endpoint for Iceberg metadata resolution.

WHY YOU DO NOT SEE A SEPARATE OPTION IN THE UI

There is no separate "Snowflake Horizon REST Iceberg" connection type because the architecture is designed so that a single Snowflake connection serves both purposes:

- Query federation (JDBC): for non-Iceberg tables and fallback scenarios
- Catalog federation (Iceberg via Horizon): for direct object storage access to Iceberg tables

The difference is in how you configure the foreign catalog, not the connection itself. When you add a storage location to the catalog configuration, you are enabling the catalog federation path that uses the Horizon REST catalog for metadata discovery.

SETUP SUMMARY

1. Create a Snowflake connection (same connection type for both approaches)
2. Create a storage credential for the cloud storage where Snowflake Iceberg tables reside
3. Create an external location for the relevant storage paths
4. Create a foreign catalog specifying the connection AND a metadata storage location

The key step that enables catalog federation (and thus the Horizon REST catalog path) is step 4, where you provide the storage location on the foreign catalog. Without that, you get pure JDBC query federation.

FREE TIER CONSIDERATIONS

Catalog federation requires specific infrastructure:
- Databricks Runtime 13.3 LTS or above
- Standard or Dedicated access mode clusters, or Pro/Serverless SQL warehouses
- Network connectivity from Databricks to both Snowflake and the cloud storage where Iceberg data resides
- Proper Unity Catalog permissions (CREATE CONNECTION, CREATE CATALOG, plus storage credential and external location setup)

If your free tier workspace supports these requirements, you should be able to configure catalog federation. If you are not seeing the option to specify a storage location when creating the foreign catalog, it may be a tier limitation or a UI path issue. Try following the catalog federation specific documentation here:
https://docs.databricks.com/en/query-federation/snowflake-catalog-federation.html

And the general connection setup here:
https://docs.databricks.com/en/query-federation/snowflake.html

Hope this clears up the distinction. The short version: Databricks already supports accessing Snowflake Iceberg tables via the Horizon catalog endpoint, and it is built into the existing Snowflake catalog federation workflow, not a separate connection type.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

View solution in original post

6 REPLIES 6

MoJaMa
Databricks Employee
Databricks Employee

Good question. I think there is some general confusion on Lakehouse Federation (JDBC) and Catalog Federation (Object Storage Access). 

Step 1. Create Connection.

https://docs.databricks.com/aws/en/query-federation/database-federation#create-a-connection

This enables Lakehouse Federation but also is a pre-req for Catalog Federation.

Step 2. Catalog Federation: 

https://docs.databricks.com/aws/en/query-federation/snowflake-catalog-federation#set-up-catalog-fede...

"The general steps for setting up catalog federation are similar to setting up query federation: create a connection to Snowflake first, then create and configure a foreign catalog. However, you must also create a storage credential and an external location for the paths to the Apache Iceberg tables registered in Snowflake."

emanueol
New Contributor III

Yes, I understood that creation of CONNECTION type= "Snowflake":

  • snowflake acc url is enough for JDBC, and CATALOG BASICS provide the EXTERNAL LOCATION & paths (Iceberg tables).

My understanding is (please confirm):

  1. DBX pulls via JDBC all Snowflake schemas, tables of specified database.
  2. upon access: if its Iceberg table try direct storage access, fallback to jdbc if something wrong with metadata/storage access/definition.
  3. For all non-iceberg tables --> use JDBC.

But how about the 3rd type of access - type "Snowflake Horizon" REST Iceberg API - access to only Snowflake Iceberg tables):

  • is this a new type of CONNECTION that I don't see because I'm on DBX  free tier ?
  • Or connection just same "Snowflake", meaning despite DBX user never sees "Snowflake Horizon rest catalog", implicitly DBX will indeed try connect to Snowflake end point on above step 2. instead of just using jdbc to pull info on snowflake iceberg tables, does DBX just appends "/iceberg" to the only physical supplied host field ~ snowflake account provided in the connection resulting on: https://<account_identifier>.snowflakecomputing.com/iceberg

Hope its clear my doubt now? Thanks in advance ~ 

MoJaMa
Databricks Employee
Databricks Employee

If you see the text in the second link I sent, you'll see this:

"Create a connection to Snowflake Horizon Catalog and create a foreign catalog using one of the authentication methods listed above. You must specify a location in cloud storage where metadata will be stored for Iceberg tables in this catalog."

ie, there's no 3rd mechanism needed.

And yes, your general understanding is correct.

To summarize that specifically:

 

  • When a table qualifies (Iceberg with supported URI/scheme), Databricks reads directly from object storage; otherwise it automatically falls back to JDBC (query federation).
  • Only specific schemes are supported (s3, abfs, gs, r2, wasb/wasbs, etc.); special characters or metadata outside the table location block direct access, triggering fallback.
  • Writing/reading via JDBC to Iceberg in a catalogโ€‘federated catalog is disabled once a storage location is configured on the catalog; nonโ€‘Iceberg tables are unaffected.

 

emanueol
New Contributor III

Thanks for your feedback, but I still don't have the insight Im trying ton<understand, sorry if it's on my side the way I expressed.

Horizon in Snowflake catalog.
BUT.. Recently Snowflake opened up a new REST ICEBERG API endpoint that that gives access to Snowflake Iceberg tables (it's comply to the standard iceberg rest api; similar to Unity Catalog REST ICEBERG API endpoint.
Snowflake SQL client drivers (JDBC, ODBC, etc) communicate with another  Snowflake REST SQL API underneath.
Meaning one can refer to Snowflake Horizon as just 1 thing, but in fact theres 2 endpoints: JDBC via SQL REST API, and the new endpoint ICEBERG REST API.

I been trying to understand in the context of acessing Snowflake managed Iceberg tables, if Databricks already supported the recent Snowflake REST ICEBERG API or not?

The future is multi compute sharing Iceberg tables via the standsrd REST ICEBERG API that both Unit Catalog and Snowflake Horizon exposes. I know how Snowflake can federate READ/WRITE with vended credentials connecting to UC rest iceberg api.

So just trying understand if the other way also possible:

Can Databricks connect to Snowflake Horizon via Snowflake REST ICEBERG API. (not JDBC) ?
Hope this clarifies, because documentation referring to Snowflake Horizon, but doesn't share if physical API only JDBC, orif DBX already supports Snowflake Horizon acccess via new REST ICEBERG API endpoint.

Hope this helps, maybe Databricks needs update documentation, but for now just trying confirm if DBX can or not uses the advanced rest iceberg.

Thanks in advance ~

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @emanueol,

Your follow-up question is clear, and it is a good distinction to make. Let me address it directly.

SHORT ANSWER

Yes, when you set up Snowflake Catalog Federation in Databricks, Databricks does use the Snowflake Horizon REST Iceberg catalog API under the hood to discover and resolve Iceberg table metadata. There is no separate "Snowflake Horizon REST Iceberg" connection type you need to create. It is all handled through the same Snowflake connection type, with the catalog federation configuration enabling the Iceberg REST catalog behavior automatically.

HOW IT WORKS UNDER THE HOOD

When you create a Snowflake connection and then create a foreign catalog with a storage location (catalog federation), here is what happens at query time:

1. Databricks uses the connection credentials to communicate with Snowflake Horizon Catalog to verify that the target table is an Iceberg table and to request the latest metadata.json path and table location.

2. If the table qualifies as an Iceberg table (compatible URI scheme like s3, abfs, gs, etc., and metadata is within the table location), Databricks reads the data directly from object storage using Databricks compute only. Snowflake compute is not involved in the query execution.

3. If the table does not qualify for direct Iceberg access (unsupported URI scheme, special characters, metadata outside the table directory, or it is not an Iceberg table at all), Databricks falls back to JDBC-based query federation, where the query is pushed down to Snowflake compute.

So to directly answer your question: you do not need a separate connection type for Snowflake's REST Iceberg API. The standard Snowflake connection type, combined with catalog federation setup (which includes specifying external locations and storage credentials), already leverages the Snowflake Horizon catalog endpoint for Iceberg metadata resolution.

WHY YOU DO NOT SEE A SEPARATE OPTION IN THE UI

There is no separate "Snowflake Horizon REST Iceberg" connection type because the architecture is designed so that a single Snowflake connection serves both purposes:

- Query federation (JDBC): for non-Iceberg tables and fallback scenarios
- Catalog federation (Iceberg via Horizon): for direct object storage access to Iceberg tables

The difference is in how you configure the foreign catalog, not the connection itself. When you add a storage location to the catalog configuration, you are enabling the catalog federation path that uses the Horizon REST catalog for metadata discovery.

SETUP SUMMARY

1. Create a Snowflake connection (same connection type for both approaches)
2. Create a storage credential for the cloud storage where Snowflake Iceberg tables reside
3. Create an external location for the relevant storage paths
4. Create a foreign catalog specifying the connection AND a metadata storage location

The key step that enables catalog federation (and thus the Horizon REST catalog path) is step 4, where you provide the storage location on the foreign catalog. Without that, you get pure JDBC query federation.

FREE TIER CONSIDERATIONS

Catalog federation requires specific infrastructure:
- Databricks Runtime 13.3 LTS or above
- Standard or Dedicated access mode clusters, or Pro/Serverless SQL warehouses
- Network connectivity from Databricks to both Snowflake and the cloud storage where Iceberg data resides
- Proper Unity Catalog permissions (CREATE CONNECTION, CREATE CATALOG, plus storage credential and external location setup)

If your free tier workspace supports these requirements, you should be able to configure catalog federation. If you are not seeing the option to specify a storage location when creating the foreign catalog, it may be a tier limitation or a UI path issue. Try following the catalog federation specific documentation here:
https://docs.databricks.com/en/query-federation/snowflake-catalog-federation.html

And the general connection setup here:
https://docs.databricks.com/en/query-federation/snowflake.html

Hope this clears up the distinction. The short version: Databricks already supports accessing Snowflake Iceberg tables via the Horizon catalog endpoint, and it is built into the existing Snowflake catalog federation workflow, not a separate connection type.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

Thanks @SteveOstrowski exactly the detail I was expecting and kinda suspected but wanted to besure, and surely will help both communities ๐Ÿ˜Š.