Hey @TJ-Leap-Forward ,
Yes โ you can stand up Databricks AI/BI Genie on top of Snowflake quickly by federating Snowflake into Unity Catalog and then building Genie spaces over those governed datasets, without migrating data out of Snowflake. This works because Genie operates on Unity Catalogโregistered data, including foreign (federated) tables and views.
What works and why
-
Genie spaces use Unity Catalog metadata and author-provided instructions to translate natural language into SQL over your governed datasets, including foreign tables and metric views.
-
Lakehouse Federation natively supports Snowflake, so you can register Snowflake databases in Unity Catalog and query them from Databricks (read-only), with pushdowns for common operations to Snowflake for performance.
-
You can choose between two federation modes:
- Query federation (JDBC pushdown to Snowflake compute) for fast, on-demand access.
- Catalog federation (direct reads of Iceberg tables in object storage using Databricks compute) for better cost/performance when your Snowflake data is stored as Iceberg.
Both options are supported for Snowflake connections in Unity Catalog.
-
Governance is unified: Unity Catalog handles permissions, lineage, and auditing on federated Snowflake tables, and Genie respects those permissions when answering questions.
Fastest path: 30โ60 minute setup
-
Prereqs
- Workspace is enabled for Unity Catalog; network connectivity to Snowflake; SQL Warehouse Pro/Serverless (2023.40+); Databricks Runtime 13.3 LTS+ for compute.
-
Create a Snowflake security integration (for OAuth) in Snowflake:
CREATE SECURITY INTEGRATION aibi_uc_oauth
TYPE = oauth
ENABLED = TRUE
OAUTH_CLIENT = custom
OAUTH_CLIENT_TYPE = 'CONFIDENTIAL'
OAUTH_REDIRECT_URI = 'https://<your-workspace-url>/login/oauth/snowflake.html'
OAUTH_ISSUE_REFRESH_TOKENS = TRUE
OAUTH_REFRESH_TOKEN_VALIDITY = 7776000
OAUTH_ENFORCE_PKCE = TRUE;
-
Create a Unity Catalog connection to Snowflake (UI or SQL):
CREATE CONNECTION sf_conn
TYPE SNOWFLAKE
OPTIONS (
host 'your-acct.region.cloud.snowflakecomputing.com',
port '443',
auth_type 'oauth',
client_id '<from SYSTEM$SHOW_OAUTH_CLIENT_SECRETS>',
client_secret '<from SYSTEM$SHOW_OAUTH_CLIENT_SECRETS>',
oauth_scope 'refresh_token session:role:<ROLE>',
user 'SNOWFLAKE_USER'
);
-
Create a foreign catalog that mirrors your Snowflake database:
CREATE FOREIGN CATALOG snowflake_sales
USING CONNECTION sf_conn
OPTIONS (database 'SALES_DB');
-
Grant access in Unity Catalog (catalog/schema/table as needed) and test a query from Databricks SQL:
SELECT * FROM snowflake_sales.public.orders LIMIT 10;
-
Build your Genie space
- Select the federated Snowflake tables/views (and any metric views) as datasets.
- Add example queries, parameters, synonyms, and space instructions to align with your semantic layer.
- Use โTrusted assetsโ (parameterized examples/functions) to mark vetted answers as trusted.
- Publish to business users (Consumer Access) and monitor with the admin tools.
Performance and architecture options
-
Query federation is ideal for fast proof-of-concept access; Databricks pushes filters/joins/aggregations down to Snowflake via JDBC to keep latency low.
-
Catalog federation (Iceberg) lets Databricks read Iceberg tables in your Snowflake catalog directly from cloud object storage, using Databricks compute for better cost/perf. Configure an external location for Iceberg paths; Databricks auto-detects Iceberg tables and falls back to query federation if a table isnโt eligible
-
For frequently queried Snowflake sources, consider materialized views in Databricks to stabilize concurrency and accelerate cross-source joins, while retaining UC governance.
Important considerations
-
Read-only: Federated Snowflake tables are read-only from Databricks; writes remain in Snowflake or via pipelines into managed UC tables.
-
Case sensitivity: Preserve case for Snowflake identifiers by quoting the database/schema/table names where needed.
-
Limits and sizing: Large result sets can pressure single-stream JDBC reads; design queries/filters accordingly and leverage metric views or materialized views for heavy workloads.
-
Governance: Unity Catalog enforces permissions and lineage across federated data; Genie uses UC metadata and your semantic annotations to improve answer quality.
Hope this helps, Louis.