cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Why does the same Databricks SQL query take different time to run?

Pradip007
Visitor

Hi all,

Iโ€™m using Databricks Free Edition with a serverless SQL warehouse. Iโ€™m the only user in this workspace.

Warehouse config:
- Type: Serverless SQL
- Size: Large
- Max clusters: 2

Query:

SELECT
w.workspace_name,
ROUND(SUM(u.usage_quantity * COALESCE(lp.pricing.effective_list.default, 0)), 2) AS total_cost_usd
FROM system.billing.usage u
JOIN system.billing.list_prices lp
ON u.sku_name = lp.sku_name
AND u.cloud = lp.cloud
AND u.usage_start_time >= lp.price_start_time
AND (lp.price_end_time IS NULL OR u.usage_start_time < lp.price_end_time)
JOIN system.access.workspaces_latest w
ON u.workspace_id = w.workspace_id
WHERE
u.usage_start_time >= CURRENT_DATE - INTERVAL '30' DAY
GROUP BY
w.workspace_name
ORDER BY
total_cost_usd DESC;

Two backโ€‘toโ€‘back runs on the same warehouse give me:

Run 1:
- Query wallโ€‘clock duration: 2 min 16 s
- Aggregated task time: 1.02 s
- Rows read: 334, Bytes read: 55.45 KB
- Bytes read from cache: 23%

Run 2:
- Query wallโ€‘clock duration: 4 min 38 s
- Aggregated task time: 92 ms
- Rows read: 1,095, Bytes read: 68.32 KB
- Bytes read from cache: 100%

So the second run is actually โ€œlighterโ€ work (shorter task time, all data from cache), but the wallโ€‘clock is roughly double. I also see from the docs that wallโ€‘clock duration includes queuing/scheduling and possibly serverless startup.

My questions:

1. How exactly does serverless SQL compute work in Databricks Free Edition?
- Is the serverless pool shared with other customers/accounts in the same region, even if I have only one user in my workspace?
- Are there any special quota or throttling rules for Free Edition that could explain longer waits or cold starts?

2. Why might the second run be slower in wallโ€‘clock even though bytes are 100% from cache and aggregated task time is only 92 ms?
- Is this likely due to waiting for serverless capacity (internal queue) or warehouse startup/autoscaling?
- Which fields in system tables or the query profile should I look at (for example, waiting_at_capacity_duration_ms, scheduling/queued time, etc.) to confirm that the extra minutes are from waiting, not execution?

 

 

2 REPLIES 2

amirabedhiafi
New Contributor II

Hello @Pradip007  !

Yes I think it is possible and what I understood from your numbers the query execution itself was not the slow part.

DBKS says query wall clock duration includes time from scheduling to the end of execution with breakdowns for scheduling, optimization pruning and execution. Aggregated task time is only the combined task execution time across cores and it can be much smaller than wall clock if tasks were waiting for resources. https://docs.databricks.com/aws/en/sql/user/queries/query-profile

Your query reads only between 55โ€“68 KB. I think that is tiny, so a 2-4 minute runtime is unlikely to be caused by the SQL logic itself.

For the second run having most of the time was spent in one of these (my main assumptions):

  • waiting for compute to be available
  • waiting for serverless warehouse startup or provisioning
  • scheduling or queueing
  • warehouse autoscaling or capacity allocation
  •  client side wait

For serverless SQL, DBKS uses intelligent workload management which means when a query arrives, it checks available capacity. If capacity is not available the query can be queued and autoscaling may provision more capacity. (https://docs.databricks.com/aws/en/compute/sql-warehouse/warehouse-behavior)

Also, don't forget that free edition has quota limitation and no guaranteed SLA and is subject to fair usage which mean users only have serverless compute and SQL warehouses are limited to one warehouse with 2X-Small cluster size. So if you see large or max clusters 2, I would double check whether this is really free edition behavior.

DBKS does not expose the exact internal capacity pool behavior per customer since serverless compute is DBKS managed not in your own cloud account and DBKS automatically allocates or manages the resources. 

So even if you are the only user in your workspace, you are still depending on DBKS managed regional serverless capacity.

If you use system.query.history, it will help you a lot :

SELECT
  statement_id,
  start_time,
  end_time,
  total_duration_ms,
  waiting_for_compute_duration_ms,
  waiting_at_capacity_duration_ms,
  compilation_duration_ms,
  execution_duration_ms,
  total_task_duration_ms,
  result_fetch_duration_ms,
  read_rows,
  read_bytes,
  read_io_cache_percent,
  from_result_cache,
  cache_origin_statement_id
FROM system.query.history
WHERE start_time >= NOW() - INTERVAL 1 DAY
  AND statement_text ILIKE '%system.billing.usage%'
ORDER BY start_time DESC;

focus on:

waiting_for_compute_duration_ms

time waiting for compute resources to be provisioned :

waiting_at_capacity_duration_ms

time waiting in queue for available compute capacity :

execution_duration_ms

actual execution time :

compilation_duration_ms

metadata loading and query optimization time :

total_task_duration_ms

aggregated task time across cores :

result_fetch_duration_ms

You can find what you need here : https://docs.databricks.com/aws/en/admin/system-tables/query-history

You can also check warehouse events:

SELECT
  event_time,
  warehouse_id,
  event_type,
  cluster_count
FROM system.compute.warehouse_events
WHERE warehouse_id = '<warehouse-id>'
  AND event_time >= NOW() - INTERVAL 1 DAY
ORDER BY event_time DESC;

system.compute.warehouse_events tracks warehouse start, stop, scale up and scale down events

https://docs.databricks.com/aws/en/compute/sql-warehouse/monitor/queries

If this answer resolves your question, could you please mark it as โ€œAccept as Solutionโ€? It will help other users quickly find the correct fix.

Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @Pradip007,

This is expected in the free edition. In the free edition, your queries run on a shared regional serverless pool with a small effective warehouse capacity and best-effort access. Even if youโ€™re the only user in your workspace, youโ€™re sharing the underlying pool with other workspaces in the same region, with strong isolation at the warehouse/session level. The behaviour is normal from a query that spent most of its life waiting for serverless capacity, which you can confirm by checking waiting_at_capacity_duration_ms in system.query.history and correlating with system.compute.warehouse_events around that time.

If this answer resolves your question, could you mark it as โ€œAccept as Solutionโ€? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***