topic Federated query on the source in Data Engineering

Federated query on the source

thomas_berry — Wed, 09 Apr 2025 09:11:53 GMT

Hello,

I want to be able to run an arbitrary query on the source before its result gets sent to databricks. I want to create something like this:

create table gold.bigquery
USING org.apache.spark.sql.jdbc
options
( url "jdbc:postgresql://---:---/---",
driver "org.postgresql.Driver", user "---",
password "---",
query "select aa, bb, cc from a join b on a.k = b.k join c on a.k = c.j"
)

but of course, I get this error:

[UC_FILE_SCHEME_FOR_TABLE_CREATION_NOT_SUPPORTED] Creating table in Unity Catalog with file scheme jdbc is not supported.
Instead, please create a federated data source connection using the CREATE CONNECTION command for the same table provider, then create a catalog based on the connection with a CREATE FOREIGN CATALOG command to reference the tables therein.

The only workaround that I see is creating a view on the source with that code and that view will be available in the foreign catalog. But the problem is that they won't let me do that.Is there any chance this could be a feature offered by Unity Catalog in the future?

I know that filters and groups are pushed down to the source through the foreign catalog. But the example query that I am showing uses joins which I know are not pushed down. Hence the necessity to create the query in the alternative way.

Thank You

Re: Federated query on the source

Brahmareddy — Tue, 15 Apr 2025 04:13:49 GMT

Hi thomas_berry,

How are you doing today?, As per my understanding, You're spot on with your understanding, and you're not alone in running into this limitation. Unity Catalog doesn’t currently support creating tables using a JDBC query like in your example, especially with complex joins, because UC doesn't allow JDBC-based file schemes for table creation—it expects you to use foreign catalogs through CREATE CONNECTION and CREATE FOREIGN CATALOG. And you're right—those only expose existing tables or views in the source, and joins in custom queries aren’t supported unless pre-defined as views on the source side. Unfortunately, if you’re not allowed to create views in the source system, that limits your options. For now, the best workaround is to pull the tables individually through the foreign catalog into Databricks and then write the join logic within Databricks (even though it won’t be pushed down). It’s not ideal for performance, but it's the safest option when view creation is off the table. Hopefully in the future, Unity Catalog might support more flexible query definitions for federated sources—but for now, we're a bit limited there. Let me know if you want help building a workaround using Databricks-side logic!

Regards,

Brahma