cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

SQL IDE Support

isaac_gritz
Databricks Employee
Databricks Employee

How to use a SQL IDE with Databricks SQL

Databricks provides SQL IDE support using DataGrip and DBeaver with Databricks SQL.

Let us know in the comments if you've used DataDrip or DBeaver with Databricks! Let us know if there are any other SQL IDEs you'd like use to support.

5 REPLIES 5

jakubk
Contributor

Is there a plan for a first-p(a)rty IDE? something that not only lets you execute sql but also manage jobs/view query plans/execution plans etc. Similar to SSMS for sql server/azure synapse

That for me has been one of the biggest dev blockers with MPPs; the lack of first-p(a)rty dev tools (a textarea inside a webpage is not an ide)

i use workbench/j as i've found it to be the 'least bad'. Still have issues though - I need to manually reconnect when i've left it idle for 5min. The other annoyance is how it doesn't stream data into the datagrid

I dont like how none of these sql ides stream data. They all wait for the full dataset to be loaded into your local memory then spend x amount of time serializing it to display in a grid. Yeah i can use LIMIT, but i've found that that changes the query plan

and sometimes when i'm running a complex long running query and i've messed something up. I dont realise 'til i get the full dataset back

whereas if it started serializing the resultset into the display datagrid as soon as it got a few dozen rows back i'd notice it and kill it rather than waiting 10-20min

ps - why is p(a)rty a banned word on these forums? dont like fun?

isaac_gritz
Databricks Employee
Databricks Employee

Hi @Jakub Kโ€‹ the Databricks SQL UI is meant to be the first-p(a)rty IDE and we are continuously iterating on it including upcoming improvements to the details available in the query history UI which today includes a detailed breakdown of the query profile. You can also use the EXPLAIN SQL statement for the explain plan before running the statement. With Databricks SQL and Unity Catalog we have also started adding information schema tables that you can query with infortmation about table metadata.

Curious to understand more about your ask on manage jobs as well as your ask on streaming data. For the latter, you can use Dashboards in Databricks SQL that you can set to refresh every minute based on the available data. Would a quicker refresh time solve your requirement here?

If you do wish to use SSMS with Databricks there is a blog from my colleague Kyle Hale on how to do this.

It sounds like the main thing you are looking for is a visual query plan before running the query? Is that accurate?

It'll be good to have sql query-able metadata information_schema views/objects for automation etc

As an sql dev, i write and execute code many hundreds of times a day

one feedback loop i've become used to on other platforms to is to be able to write an sql select query, hit execute (and irrespective of how many rows are in the final dataset), and start seeing the rows streaming within a few sec - basically as soon as the query begins executing. Does that make sense?

an example

0sec: submit sql query

1-10 sec; generate plan, send to workers

11-100sec; query is executed and data sent to ide

== 15-105 sec: some rows start appearing in ide, continue to stream in until query is finished

so I start seeing the result of my query after ~15 sec, and long before it's finished returning all data

But in workbench j, other jdbc sql clients what I see is:

0sec: submit sql query

1-10 sec; generate plan, send to workers

11-100sec; query is executed and data sent to ide

101-130 sec: IDE churns thru resultset to serialise it into a visual/grid all in one go

131 sec: i see all the data

difference: see the first rows of data after ~15sec vs 2+minutes

multiply that a couple hundred times a day and it adds up

I find this useful for

1: implicitly gauging the performance of my queries

3. agile data dev - i can write a query w/o spending time thinking about hand crafting exclusions/filters or how many rows i might need

2: running tests/debugging prod code output interactively - because the data returns as soon as some data is available and i can kill the query once i know enough

PriyaAnanthram
Contributor III

Use dbeaver and databricks

Jag
New Contributor III

dbeaver is perfectly working fine but I fount one issue it wont show the correct error for query. 

โ€œEmpower Minds, Share Wisdom - Together We Grow!โ€

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group