cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to query delta lake using SQL desktop tools like SSMS or DBVisualizer

prasadvaze
Valued Contributor II

Is there a way to use sql desktop tools? because delta OSS or databricks does not provide desktop client (similar to azure data studio) to browse and query delta lake objects.

I currently use databricks SQL , a webUI in the databricks workspace but searching for means beyond that, especially for end business users who are familiar with SSMS type clients.

@Bilal Aslam​ 

1 ACCEPTED SOLUTION

Accepted Solutions

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@prasad vaze​ I don't use SSMS (not a Windows user), and I'm not even sure if it will work with Databricks. However, I did want to list the steps to get DBvisualizer to work:

  1. Download Databricks' JDBC drivers and extract them on your computer
  2. Start DbVisualizer and in Preferences > General > Driver Manager, add the folder where you extracted the driver to the search path. Click Ok to close the preferences dialog.
  3. Click Tools > Driver Manager and add a JDBC (Generic) driver
  4. Name the new driver "Databricks" and provide this JDBC URL: jdbc:spark://<workspace>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<endpoint>
  5. Driver class should be com.simba.spark.jdbc.Driver
  6. In the left pane, add a new Connection. Settings format will (by default) be Database URL.
  7. Enter this in Database URL: jdbc:spark://{{workspace}}:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/{{endpoint_id}};
  8. Replace the values of {{workspace}} and {{endpoint_id}}. workspace_id is just the DNS name of your workspace, and endpoint_id can be found in Databricks clicking in the left nav on SQL > SQL Endpoints > Connection Details. It's in the last fragment in the HTTP path field e.g. /sql/1.0/endpoints/{{workspace_id}}
  9. In the same settings page in Databricks, click "Create a personal access token" (PAT) and copy the value
  10. Back in DbVisualizer in the Connection page, set Database Userid to "token" and Database Password to the value of the PAT
  11. Now you can connect!
  12. Note that you may have to issue a "use <database_name>" statement before your query (in case you get a "table or view not found" error

View solution in original post

14 REPLIES 14

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

Hi @prasad vaze​ - definitely, you can use dbeaver or DbVisualizer to do this. Download the drivers first and connect with one of these tools.

Hi Bilal, Old post, but I am trying to do the same, but with Databricks in Azure.

I downloaded the JDBC Driver following instructions here : https://docs.databricks.com/dev-tools/dbeaver.html

I have my Databricks cluster running, but when I then try to connect from Dbeaver, I get this error :

[Simba][SparkJDBCDriver](500593) Communication link failure. Failed to connect to server. Reason: HTTP Response code: 401, Error message: Unknown.

I have Googled around and cannot seem to find an answer. I have Can Manage permissions on the cluster, I did generate the Personal Access Token and it is part of the URL with userid set to token

Read through this page on MS site and I seem to have done everything, yet it does not work

Databricks ODBC and JDBC drivers - Azure Databricks | Microsoft Docs

@Bill Davis​ 

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@Navin Ladda​  apologies for the late reply. I went through the end to end process and verified that I'm indeed able to connect from dbeaver to Azure Databricks. Did you read the docs and exactly follow the steps? One common mistake is to not enter the JDBC URL correctly

NavinL
New Contributor III

I was able to resolve this. I did have an error in the JDBC URL where I still have < > surrounding the token value. Once I removed it, was able to connect Dbeaver to Databricks cluster.

The method to have it setup via Active directory was head spinning for me to follow along. But for now this time limited token method is fine to explore working with Databricks.

prasadvaze
Valued Contributor II

Thanks @Bilal Aslam​ I was able to connect via SQL Workbench/j by following this https://docs.databricks.com/integrations/bi/workbenchj.html and then also successfully connected from DBVisualizer

Trying to figure out how to connect via SSMS

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@prasad vaze​ I don't use SSMS (not a Windows user), and I'm not even sure if it will work with Databricks. However, I did want to list the steps to get DBvisualizer to work:

  1. Download Databricks' JDBC drivers and extract them on your computer
  2. Start DbVisualizer and in Preferences > General > Driver Manager, add the folder where you extracted the driver to the search path. Click Ok to close the preferences dialog.
  3. Click Tools > Driver Manager and add a JDBC (Generic) driver
  4. Name the new driver "Databricks" and provide this JDBC URL: jdbc:spark://<workspace>:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/<endpoint>
  5. Driver class should be com.simba.spark.jdbc.Driver
  6. In the left pane, add a new Connection. Settings format will (by default) be Database URL.
  7. Enter this in Database URL: jdbc:spark://{{workspace}}:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=/sql/1.0/endpoints/{{endpoint_id}};
  8. Replace the values of {{workspace}} and {{endpoint_id}}. workspace_id is just the DNS name of your workspace, and endpoint_id can be found in Databricks clicking in the left nav on SQL > SQL Endpoints > Connection Details. It's in the last fragment in the HTTP path field e.g. /sql/1.0/endpoints/{{workspace_id}}
  9. In the same settings page in Databricks, click "Create a personal access token" (PAT) and copy the value
  10. Back in DbVisualizer in the Connection page, set Database Userid to "token" and Database Password to the value of the PAT
  11. Now you can connect!
  12. Note that you may have to issue a "use <database_name>" statement before your query (in case you get a "table or view not found" error

prasadvaze
Valued Contributor II

I am able to connect via DBVisualizer and SQL workbench to delta lake. But our users don't want to use anything other than SSMS (sql server management studio) or Azure data studio. These are defacto clients for sql server users. It will be a great opportunity for delta lake (data bricks) to enable these tools for connection to delta lake ( many sql server users will convert to delta lake users)

prasadvaze
Valued Contributor II

@Bilal Aslam​ I still think databricks should provide a way to connect to delta tables from SSMS or azure data studio desktop clients. Because there are so many users of these tools , once delta tables are accessible from these tools then the delta lake will catch fire ( easy to talk users into switching from sql server/ azure synapse). Microsoft will not provide this obviously so its in databricks benefit to resolve this limitation. right now many business users in my company prefer to use sql db just because they can't query delta lake using their familiar client (SSMS / azure data studio)

BilalAslamDbrx
Honored Contributor III
Honored Contributor III

@prasad vaze​  I totally agree. I'd love to support connecting from Azure Data Studio and even SSMS!

is databricks working on developing the client tool?

@prasad vaze​ we are working on integrating with lots of client tools.

@Bilal Aslam​ Great , I am looking forward to SSMS or Azure data studio integration. I understand that connecting to delta lake via SQL Endpoint or spark cluster is better performance experience than connecting via DSR (delta standalone reader library) . SSMS/Studio clients running on normal desktop/laptop don't have local spark running but they can always connect to delta lake via cluster/SQL Endpoint . If that happens then many of my users will start using delta lake instead of azure synapse

NavinL
New Contributor III

@prasad vaze​ Your comment made me think of our own environment. What is DSR ?

I have only briefly used Azure Data Studio for connecting to Azure SQL as I prefer Dbeaver. But your comment about using Delta Lake vs Azure Synapse...can you please elaborate as we plan to use Synapse as well. Do you use Azure Synapse as the final layer in your environment from where end users get their data/connect to Power BI etc ? Or are you envisioning being able to have these users connect directly to Delta Lake as Delta lake claims to give Relational database like Update/Delete capabilities with improved query performance ?

prasadvaze
Valued Contributor II

DSR is Delta Standalone Reader. see more here - https://docs.delta.io/latest/delta-standalone.html

Its a crate (and also now a py library) that allows you to connect to delta tables without using spark (e.g. directly from python and not using pyspark) so its good use case if you don't or can't have spark environment (e.g. user desktop or azure function app ) AND this is developer friendly but not business user friendly way to query delta lake

In our case , we "have to" use Synapse SQL DW as data consumption layer for end users / BI tools who are so much accustomed to traditional desktop client-to -sql server way of doing things. Its bit hard to move them from T-sql to spark sql but what has been harder is to move them from Sql server to delta lake because of the lack of direct connectivity from our most popular desktop clients (SSMS, azure data studio) to delta lake. Its not really direct connection because, there has to be a spark cluster between desktop client and backend delta lake . The cluster provides JDBC driver.

I see a potential in delta lake to become final data consumption layer someday. Performance and cost is better but user-friendliness its not there yet in my opinion due to above limitations and some other limitations (for e.g. Implementing row/column level security in delta lake is more involved than in RDBMS. Either you have to use OTS tool or databricks proprietary tool Unity Catalog which is still in private preview )

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!