cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

DLT pipeline cannot read from a Unity Catalog foreign catalog

QueryingQuail
New Contributor III

We are having some difficulties working with Onelake connections.

What we have done:

  1. Set up a Databricks connection to Onelake
  2. Created a foreign catalog

We try to read using:

import dlt

@dlt.table
def fabric_test():
    return spark.read.table("fabric.dbo.nnit_buildingcase")

And we then get the error:

error:
Message: Failed to find the data source: UNKNOWN_CONNECTION_TYPE. Make sure the provider name is correct and the package is properly registered and compatible with your Spark version.
Error class: DATA_SOURCE_NOT_FOUND
SQL state: 42K02
Table: [REDACTED].[REDACTED].[REDACTED]

I know the error cause theoretically can be multiple things, but maybe someone here can tell me if the above *should* be possible as written. Another note is that the connection, in the Databricks UI, the URL is referenced as "https://internal/".

1 ACCEPTED SOLUTION

Accepted Solutions

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @QueryingQuail,

The behavior you are seeing is expected. Lakeflow Spark Declarative Pipelines (SDP), previously known as DLT, do not currently support reading directly from Lakehouse Federation foreign catalogs. The DATA_SOURCE_NOT_FOUND / UNKNOWN_CONNECTION_TYPE error occurs because the SDP pipeline runtime does not have the federation connector needed to resolve the foreign catalog connection at query time.

For reference, Unity Catalog-enabled SDP pipelines can read from these data sources:

- Unity Catalog managed and external tables, views, materialized views, and streaming tables
- Hive metastore tables and views
- Auto Loader (using the read_files() function) from Unity Catalog external locations
- Apache Kafka and Amazon Kinesis

Foreign catalogs (including OneLake connections) are not in that supported list. This applies regardless of the foreign catalog type.

You can find the supported sources documented here:
https://docs.databricks.com/en/delta-live-tables/unity-catalog.html

The "https://internal/" URL you noticed in the connection details is normal for OneLake catalog federation connections, so that part is not the issue.

WORKAROUND

The recommended approach is to stage the data from your foreign catalog into a regular Unity Catalog table first, and then have your SDP pipeline read from that UC table. You can do this with a scheduled Lakeflow Job (or a simple notebook) that runs a query like:

CREATE OR REPLACE TABLE my_catalog.my_schema.onelake_buildingcase
AS SELECT * FROM fabric.dbo.nnit_buildingcase

Then update your pipeline code to read from the staged table:

import dlt

@dlt.table
def fabric_test():
  return spark.read.table("my_catalog.my_schema.onelake_buildingcase")

If you need incremental/streaming behavior, you could use a MERGE statement in the staging job to handle updates, or use Auto Loader if the source data is available as files in cloud storage.

ALTERNATIVE: NOTEBOOK TASK IN THE SAME JOB

If you want to keep everything in a single workflow, you can create a Lakeflow Job with two tasks:

1. A notebook task that queries the foreign catalog and writes to a UC table
2. Your SDP pipeline task (dependent on task 1) that reads from that UC table

This gives you a single orchestrated workflow while working within the current supported data source boundaries for SDP pipelines.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.

View solution in original post

4 REPLIES 4

Ale_Armillotta
Contributor III

Hi @QueryingQuail .

Is it possible that you are under some of these limitations? https://learn.microsoft.com/en-us/azure/databricks/query-federation/onelake#query-onelake-data.Screenshot 2026-03-05 alle 11.24.24.png

 

Did you try using only SQL?

Louis_Frolio
Databricks Employee
Databricks Employee

Greetings @QueryingQuail , 

Short answer: no. A Unity Catalog DLT pipeline can't read directly from a foreign catalog — which is what your OneLake "fabric" catalog is. That's why spark.read.table("fabric.dbo.nnit_buildingcase") fails inside DLT even when the foreign catalog itself works fine.

Here's why:

The catalog you created from the OneLake connection is a foreign catalog for query federation. It exposes tables managed by the external system — it's not a UC-native data source. UC-enabled pipelines are documented to support UC managed/external tables, Hive metastore tables, UC external locations via Auto Loader, and Kafka/Kinesis. Foreign tables aren't on that list. Worth flagging: the Lakeflow Connect docs explicitly state that the catalog created during connection setup "cannot be used" for an ingestion pipeline precisely because it's foreign. Ingestion requires a managed catalog.

So this pattern:

import dlt

@dlt.table
def fabric_test():
    return spark.read.table("fabric.dbo.nnit_buildingcase")

won't work today when fabric is a foreign catalog. The UNKNOWN_CONNECTION_TYPE / DATA_SOURCE_NOT_FOUND error is DLT saying this table type isn't supported in this runtime.

The supported pattern is a two-step approach:

First, ingest from OneLake into a UC managed catalog/schema — via a Lakeflow Connect ingestion pipeline or another ETL job — so the data lands as UC managed or external Delta tables, not foreign tables.

Second, point your DLT pipeline at those UC tables:

import dlt

@dlt.table
def fabric_staging():
    return spark.read.table("my_catalog.my_schema.nnit_buildingcase_staging")

This keeps OneLake access in the connection/ingestion layer and lets DLT operate only on UC-native tables — which is what it supports today.

Hope this helps, Louis

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @QueryingQuail,

The behavior you are seeing is expected. Lakeflow Spark Declarative Pipelines (SDP), previously known as DLT, do not currently support reading directly from Lakehouse Federation foreign catalogs. The DATA_SOURCE_NOT_FOUND / UNKNOWN_CONNECTION_TYPE error occurs because the SDP pipeline runtime does not have the federation connector needed to resolve the foreign catalog connection at query time.

For reference, Unity Catalog-enabled SDP pipelines can read from these data sources:

- Unity Catalog managed and external tables, views, materialized views, and streaming tables
- Hive metastore tables and views
- Auto Loader (using the read_files() function) from Unity Catalog external locations
- Apache Kafka and Amazon Kinesis

Foreign catalogs (including OneLake connections) are not in that supported list. This applies regardless of the foreign catalog type.

You can find the supported sources documented here:
https://docs.databricks.com/en/delta-live-tables/unity-catalog.html

The "https://internal/" URL you noticed in the connection details is normal for OneLake catalog federation connections, so that part is not the issue.

WORKAROUND

The recommended approach is to stage the data from your foreign catalog into a regular Unity Catalog table first, and then have your SDP pipeline read from that UC table. You can do this with a scheduled Lakeflow Job (or a simple notebook) that runs a query like:

CREATE OR REPLACE TABLE my_catalog.my_schema.onelake_buildingcase
AS SELECT * FROM fabric.dbo.nnit_buildingcase

Then update your pipeline code to read from the staged table:

import dlt

@dlt.table
def fabric_test():
  return spark.read.table("my_catalog.my_schema.onelake_buildingcase")

If you need incremental/streaming behavior, you could use a MERGE statement in the staging job to handle updates, or use Auto Loader if the source data is available as files in cloud storage.

ALTERNATIVE: NOTEBOOK TASK IN THE SAME JOB

If you want to keep everything in a single workflow, you can create a Lakeflow Job with two tasks:

1. A notebook task that queries the foreign catalog and writes to a UC table
2. Your SDP pipeline task (dependent on task 1) that reads from that UC table

This gives you a single orchestrated workflow while working within the current supported data source boundaries for SDP pipelines.

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.

QueryingQuail
New Contributor III

Thank you all for the reply and please excuse the delay on my part - I've been away.

I've read all three replies and this clarified all my questions and widened my understanding of foreign catalog handling within Unity.