cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks SQL script slow execution in workflows using serverless

iamgoda
New Contributor III

I am running a very simple SQL script within a notebook, using an X-Small SQL Serverless warehouse (that is already running). The execution time is different depending on how it's run:

  • 4s if run interactively (and through SQL editor)

  • 26s if run within a workflow as a notebook task

If you look at the query history screenshot below you see the individual queries are taking the same amount of time to execute, but for some reason there's a few seconds delay between end of one query and start of the next query (line) when running through workflow vs interactively (both using SQL serverless warehouse). I tried both current and preview channel and they both behave the same way

I even tried using an all purpose compute cluster (Single node: Standard_DS3_v2 ยท DBR: 15.3) and the times were consistent when running in a workflow vs interactively.

Script

 

 

declare or replace start_time timestamp = current_timestamp();
declare or replace end_time timestamp = current_timestamp();
declare or replace v STRUCT<SOURCE_TBL_MAX_TIMESTAMP STRING, TARGET_TBL_MAX_TIMESTAMP STRING, SOURCE_MAX_TIMESTAMP TIMESTAMP, TARGET_MAX_TIMESTAMP TIMESTAMP, MIN_TIME TIMESTAMP, MAX_TIME TIMESTAMP, MAX_INTERVAL_TIME TIMESTAMP, TABLE_SOURCE STRING, TABLE_TARGET STRING, MERGE_STATEMENT STRING, INTERVAL_COLUMN_NAME STRING, STAGING_TABLE_NAME STRING, STAGING_TABLE_CREATE_STATEMENT STRING, FULL_STAGING_TABLE_NAME STRING>;
select current_timestamp();
select v;
select current_timestamp();
select timestampdiff(MILLISECOND  , start_time, current_timestamp()) as total_time_elapsed_ms;

 

 

Screenshots 

iamgoda_4-1720697910509.png
iamgoda_5-1720697937883.png

Query History

iamgoda_7-1720698691523.png

Further tests

Using a SQL file workflow task, running on X-Small SQL serverless, the same script executes as expected (faster than notebook task):

iamgoda_0-1720701617441.png

Not sure why the type of workflow task would impact the execution of the SQL script on a serverless warehouse.

 

8 REPLIES 8

iamgoda
New Contributor III

@SergeRielau - is this another bug or?

SergeRielau
Valued Contributor

@iamgoda No idea. In DBSQL session variables and their values must be preserved outside the runtime engine (since the session may be migrated), but that doesn't seem to be at play here (and it should not have such an impact).

iamgoda
New Contributor III

Yes and the fact that running it as a SQL script task works. Running it as a notebook task on a SQL serverless instance seems to be the issue.

BilalAslamDbrx
Honored Contributor III

@iamgoda  we are going to look into how to make this faster. There's a poll loop in Databricks Workflows for SQL notebooks (but not for SQL scripts) which causes things to slow down. 

@BilalAslamDbrx thanks for looking into this and confirming the issue. This doesn't happen if the SQL notebook task is run using all purpose compute in a workflow, so wondering if the poll loop only applies to SQL serverless compute? 

BilalAslamDbrx
Honored Contributor III

This adds up. This particular polling inefficiency only applies to the flow of notebooks running on warehouses (not on clusters).

Any idea of whether this will be addressed and when?

BilalAslamDbrx
Honored Contributor III

No not yet

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group