cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

remote_query() is not working

vr
Contributor III

I am trying to experiment with remote_query() function according to the documentation. The feature is in public preview, so I assume it should be available to everyone now.

select * from remote_query(
  'my_connection',
  database => 'mydb',
  dbtable => 'my_table'
  )

Whenever I query, I am getting one of two errors:

[INVALID_USAGE_OF_STAR_OR_REGEX] Invalid usage of '*' in Project. SQLSTATE: 42000
== SQL (line 2, position 8) ==
select * from remote_query(
       ^

If I change `*` to a column name, say, `id`, I get the following:

[UNRESOLVED_COLUMN.WITHOUT_SUGGESTION] A column, variable, or function parameter with name `id` cannot be resolved.  SQLSTATE: 42703
== SQL (line 2, position 8) ==
select id from remote_query(
       ^^

All prerequisites are satisfied: connection is live, which is confirmed by a Lakehouse Federation catalog built on top of the same connection. Runtime is 17.3 RTS, have USE CONNECTION (I am also the owner).

Can anyone please explain what I am doing wrong?

9 REPLIES 9

GA4
New Contributor

Does the cluster (what you have used to run the query) have manage permission on the connection?

 

vr
Contributor III

In Databricks, identities can be users, groups, and service principals. Cluster cannot be granted a permission. But the user of the cluster – yes, it has USE CONNECTION, and also the owner of connection.

Coffee77
Contributor III

Which sql database are you trying to connect to? Depending on source, maybe you need to reference schema in addition to database and table. You can check, this is only a guess.


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

vr
Contributor III

Database is Postgres. I tried to prefix tables with schema, such as "public", but with no avail. Even something as trivial as "SELECT 1" returns the same error.

Database log also shows no activity – no successful or failed queries.

Coffee77
Contributor III

So, you mean, same connection working fine in SQL federated queries is not working with remote_query()? Really, really strange, it seems as if that function was not available in your environment... Not played yet with remote_query() but I'll do and let you know!


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

Coffee77
Contributor III

Same issues from my end, both using All-purpose or SQL Warehouse clusters over a connection that works with SQL federation:

Coffee77_0-1764322632277.png

Coffee77_1-1764322656184.png

 

 


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

vr
Contributor III

Thanks for testing @Coffee77. Yeah, that's really strange:

  • Federated queries over the same connection – work.
  • Spark JDBC directly to DB – works (this does not really use the connection, but just to add more evidence for proper connectivity).

Contacted Microsoft Support.

Coffee77
Contributor III

A little bit frustrating as there is no more we can do or test. I'll be also waiting for a response as it seems to be a bug or at least something is missing out in the documentation.


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData

Coffee77
Contributor III

Latest feedback from my end, just in case it could help to troubleshoot or solve the issue:

Coffee77_0-1764331191814.png

remote_query table value function (TVF) is not reachable from clusters.

 


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData