Databricks

isaac_gritz · ‎08-23-2022

How to use a SQL IDE with Databricks SQL

Databricks provides SQL IDE support using DataGrip and DBeaver with Databricks SQL.

Let us know in the comments if you've used DataDrip or DBeaver with Databricks! Let us know if there are any other SQL IDEs you'd like use to support.

jakubk · ‎09-08-2022

Is there a plan for a first-p(a)rty IDE? something that not only lets you execute sql but also manage jobs/view query plans/execution plans etc. Similar to SSMS for sql server/azure synapse

That for me has been one of the biggest dev blockers with MPPs; the lack of first-p(a)rty dev tools (a textarea inside a webpage is not an ide)

i use workbench/j as i've found it to be the 'least bad'. Still have issues though - I need to manually reconnect when i've left it idle for 5min. The other annoyance is how it doesn't stream data into the datagrid

I dont like how none of these sql ides stream data. They all wait for the full dataset to be loaded into your local memory then spend x amount of time serializing it to display in a grid. Yeah i can use LIMIT, but i've found that that changes the query plan

and sometimes when i'm running a complex long running query and i've messed something up. I dont realise 'til i get the full dataset back

whereas if it started serializing the resultset into the display datagrid as soon as it got a few dozen rows back i'd notice it and kill it rather than waiting 10-20min

ps - why is p(a)rty a banned word on these forums? dont like fun?

isaac_gritz · ‎09-09-2022

Hi @Jakub K the Databricks SQL UI is meant to be the first-p(a)rty IDE and we are continuously iterating on it including upcoming improvements to the details available in the query history UI which today includes a detailed breakdown of the query profile. You can also use the EXPLAIN SQL statement for the explain plan before running the statement. With Databricks SQL and Unity Catalog we have also started adding information schema tables that you can query with infortmation about table metadata.

Curious to understand more about your ask on manage jobs as well as your ask on streaming data. For the latter, you can use Dashboards in Databricks SQL that you can set to refresh every minute based on the available data. Would a quicker refresh time solve your requirement here?

If you do wish to use SSMS with Databricks there is a blog from my colleague Kyle Hale on how to do this.

It sounds like the main thing you are looking for is a visual query plan before running the query? Is that accurate?

jakubk · ‎09-11-2022

It'll be good to have sql query-able metadata information_schema views/objects for automation etc

As an sql dev, i write and execute code many hundreds of times a day

one feedback loop i've become used to on other platforms to is to be able to write an sql select query, hit execute (and irrespective of how many rows are in the final dataset), and start seeing the rows streaming within a few sec - basically as soon as the query begins executing. Does that make sense?

an example

0sec: submit sql query

1-10 sec; generate plan, send to workers

11-100sec; query is executed and data sent to ide

== 15-105 sec: some rows start appearing in ide, continue to stream in until query is finished

so I start seeing the result of my query after ~15 sec, and long before it's finished returning all data

But in workbench j, other jdbc sql clients what I see is:

0sec: submit sql query

1-10 sec; generate plan, send to workers

11-100sec; query is executed and data sent to ide

101-130 sec: IDE churns thru resultset to serialise it into a visual/grid all in one go

131 sec: i see all the data

difference: see the first rows of data after ~15sec vs 2+minutes

multiply that a couple hundred times a day and it adds up

I find this useful for

1: implicitly gauging the performance of my queries

3. agile data dev - i can write a query w/o spending time thinking about hand crafting exclusions/filters or how many rows i might need

2: running tests/debugging prod code output interactively - because the data returns as soon as some data is available and i can kill the query once i know enough

PriyaAnanthram · ‎09-11-2022

Use dbeaver and databricks

Databricks

SQL IDE Support

Announcing the General Availability of Databricks Asset Bundles

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Register now and save 50% on training at Data + AI Summit!