Databricks Community

RicksDB · ‎09-10-2021

Hi,

is there any way/workaround to query JDBC tables the same way one can do with other type of clusters?

Doing so right now causes an error saying that only text based files are supported (json, parquet, delta etc) even though the tables are recognized in the sql workspace.

Anonymous · ‎09-11-2021

Hello @E H!

My name is Piper and I'm one of the community moderators. I wanted to pop in and thank you for your question. I'm sure that a fellow community member will be by to answer your question shortly. If not, the team will be back on Monday.

Cheers!

Sebastian · ‎09-13-2021

To my understanding SQL Endpoint is connected to your hive metastore. One possible way is to connect your metastore to the corresponding JDBC table so that metadata is available for SQL endpoint to connect.

RicksDB · ‎09-13-2021

Hi Sebastian, thank you for your answer.

You are right, SQL endpoint uses the same metastore as the other clusters. Therefore, after creating a JDBC table in a high currency cluster, I do see the table in the sql analytics workspace. The table is recognized and the schema is even visible. However, when querying the table , an error occurs explaining than JDBC is not supported, only text based file are. The same query works correctly when using the high currency cluster.

BilalAslamDbrx · ‎09-17-2021

@E H can you share a screenshot of the error in Databricks SQL? And also a screenshot of the query working in the Data science & engineering workspace?

RicksDB · ‎09-18-2021

Hi Muhammed,

Here are the screenshots querying the same table. The first one is in the Analytics SQL workspace. The second one in the engineering workspace.

SQL Endpoint

High Concurrency (Or any other clusters beside sql clusters)

RicksDB · ‎09-20-2021

To add more information to the topic,

I've seen this element within the release notes. Is this related? Was JDBC supported but deactivated?

BilalAslamDbrx · ‎09-21-2021

@E H I thought I replied, but apparently I didn't -- apologies for the late response. So there are two different things in play:

JDBC tables are supported by Apache Spark and as you can see in your screenshot, they work just fine in the Data Science & Engineering Workspace. However, they do NOT work in Databricks SQL. However, we are interested in supporting data sources like these but we're likely to support Redshift, MySQL etc. before we get to generic JDBC support. So, to be clear, the JDBC data source never worked in DBSQL.
External Data Sources are different. They are part of the open source Redash product, but we are not planning on developing this feature right now.

What actual engine are you connecting to with the JDBC data source? Is it MySQL, Postgres etc?

realolap · ‎11-10-2021

Being able to join data coming both from the delta lake and Azure SQL would be an excellent feature for SQL Analytics - providing us the option to supply the reporting community (Power BI) with a single data hub.

RicksDB · ‎09-21-2021

Thanks for the clarifications @Bilal Aslam .

The query above was executed on an Azure sql managed instance datasource.

Thanks,

Eric

RicksDB · ‎12-22-2021

Any news regarding this feature?

BilalAslamDbrx · ‎12-28-2021

@E H nope, not yet. It's definitely on our list of things to figure out and support.

dimsh · ‎12-23-2021

+1

It would be great to have this option. I'm on the way to building a Data Analytics Platform based on Databricks SQL for my customer. Databricks SQL is really cool product, but it seems it doesn't support reading data from PostgreSQL (Azure DBaaS). For me, it just shows a shema and that's all. Works well in the Data Engineering workspace. Please consider this feature.

RyanD-AgCountry · ‎02-23-2022

I've been waiting patiently for this option since public preview early 2021. The vast majority of our data is in SQL Server databases, and because we are unable to query these data sources is the primary reason the data team hasn't adopted SQL Workspace as a solution. We support JDBC connectivity, and it is nice that the metastore works, but without being able to query the data is in my opinion oversight.