SQL IDE Support
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-23-2022 01:15 AM
How to use a SQL IDE with Databricks SQL
Databricks provides SQL IDE support using DataGrip and DBeaver with Databricks SQL.
Let us know in the comments if you've used DataDrip or DBeaver with Databricks! Let us know if there are any other SQL IDEs you'd like use to support.
- Labels:
-
Databricks SQL
-
Datagrip
-
DBeaver
-
Ide
-
Proven Practice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-08-2022 10:25 PM
Is there a plan for a first-p(a)rty IDE? something that not only lets you execute sql but also manage jobs/view query plans/execution plans etc. Similar to SSMS for sql server/azure synapse
That for me has been one of the biggest dev blockers with MPPs; the lack of first-p(a)rty dev tools (a textarea inside a webpage is not an ide)
i use workbench/j as i've found it to be the 'least bad'. Still have issues though - I need to manually reconnect when i've left it idle for 5min. The other annoyance is how it doesn't stream data into the datagrid
I dont like how none of these sql ides stream data. They all wait for the full dataset to be loaded into your local memory then spend x amount of time serializing it to display in a grid. Yeah i can use LIMIT, but i've found that that changes the query plan
and sometimes when i'm running a complex long running query and i've messed something up. I dont realise 'til i get the full dataset back
whereas if it started serializing the resultset into the display datagrid as soon as it got a few dozen rows back i'd notice it and kill it rather than waiting 10-20min
ps - why is p(a)rty a banned word on these forums? dont like fun?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-09-2022 11:34 AM
Hi @Jakub K the Databricks SQL UI is meant to be the first-p(a)rty IDE and we are continuously iterating on it including upcoming improvements to the details available in the query history UI which today includes a detailed breakdown of the query profile. You can also use the EXPLAIN SQL statement for the explain plan before running the statement. With Databricks SQL and Unity Catalog we have also started adding information schema tables that you can query with infortmation about table metadata.
Curious to understand more about your ask on manage jobs as well as your ask on streaming data. For the latter, you can use Dashboards in Databricks SQL that you can set to refresh every minute based on the available data. Would a quicker refresh time solve your requirement here?
If you do wish to use SSMS with Databricks there is a blog from my colleague Kyle Hale on how to do this.
It sounds like the main thing you are looking for is a visual query plan before running the query? Is that accurate?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2022 10:49 PM
It'll be good to have sql query-able metadata information_schema views/objects for automation etc
As an sql dev, i write and execute code many hundreds of times a day
one feedback loop i've become used to on other platforms to is to be able to write an sql select query, hit execute (and irrespective of how many rows are in the final dataset), and start seeing the rows streaming within a few sec - basically as soon as the query begins executing. Does that make sense?
an example
0sec: submit sql query
1-10 sec; generate plan, send to workers
11-100sec; query is executed and data sent to ide
== 15-105 sec: some rows start appearing in ide, continue to stream in until query is finished
so I start seeing the result of my query after ~15 sec, and long before it's finished returning all data
But in workbench j, other jdbc sql clients what I see is:
0sec: submit sql query
1-10 sec; generate plan, send to workers
11-100sec; query is executed and data sent to ide
101-130 sec: IDE churns thru resultset to serialise it into a visual/grid all in one go
131 sec: i see all the data
difference: see the first rows of data after ~15sec vs 2+minutes
multiply that a couple hundred times a day and it adds up
I find this useful for
1: implicitly gauging the performance of my queries
3. agile data dev - i can write a query w/o spending time thinking about hand crafting exclusions/filters or how many rows i might need
2: running tests/debugging prod code output interactively - because the data returns as soon as some data is available and i can kill the query once i know enough
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-11-2022 09:54 PM
Use dbeaver and databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-17-2024 11:34 AM
dbeaver is perfectly working fine but I fount one issue it wont show the correct error for query.