Hello @Pradip007 !
Yes I think it is possible and what I understood from your numbers the query execution itself was not the slow part.
DBKS says query wall clock duration includes time from scheduling to the end of execution with breakdowns for scheduling, optimization pruning and execution. Aggregated task time is only the combined task execution time across cores and it can be much smaller than wall clock if tasks were waiting for resources. https://docs.databricks.com/aws/en/sql/user/queries/query-profile
Your query reads only between 55ā68 KB. I think that is tiny, so a 2-4 minute runtime is unlikely to be caused by the SQL logic itself.
For the second run having most of the time was spent in one of these (my main assumptions):
- waiting for compute to be available
- waiting for serverless warehouse startup or provisioning
- scheduling or queueing
- warehouse autoscaling or capacity allocation
- client side wait
For serverless SQL, DBKS uses intelligent workload management which means when a query arrives, it checks available capacity. If capacity is not available the query can be queued and autoscaling may provision more capacity. (https://docs.databricks.com/aws/en/compute/sql-warehouse/warehouse-behavior)
Also, don't forget that free edition has quota limitation and no guaranteed SLA and is subject to fair usage which mean users only have serverless compute and SQL warehouses are limited to one warehouse with 2X-Small cluster size. So if you see large or max clusters 2, I would double check whether this is really free edition behavior.
DBKS does not expose the exact internal capacity pool behavior per customer since serverless compute is DBKS managed not in your own cloud account and DBKS automatically allocates or manages the resources.
So even if you are the only user in your workspace, you are still depending on DBKS managed regional serverless capacity.
If you use system.query.history, it will help you a lot :
SELECT
statement_id,
start_time,
end_time,
total_duration_ms,
waiting_for_compute_duration_ms,
waiting_at_capacity_duration_ms,
compilation_duration_ms,
execution_duration_ms,
total_task_duration_ms,
result_fetch_duration_ms,
read_rows,
read_bytes,
read_io_cache_percent,
from_result_cache,
cache_origin_statement_id
FROM system.query.history
WHERE start_time >= NOW() - INTERVAL 1 DAY
AND statement_text ILIKE '%system.billing.usage%'
ORDER BY start_time DESC;
focus on:
waiting_for_compute_duration_ms
time waiting for compute resources to be provisioned :
waiting_at_capacity_duration_ms
time waiting in queue for available compute capacity :
execution_duration_ms
actual execution time :
compilation_duration_ms
metadata loading and query optimization time :
total_task_duration_ms
aggregated task time across cores :
result_fetch_duration_ms
You can find what you need here : https://docs.databricks.com/aws/en/admin/system-tables/query-history
You can also check warehouse events:
SELECT
event_time,
warehouse_id,
event_type,
cluster_count
FROM system.compute.warehouse_events
WHERE warehouse_id = '<warehouse-id>'
AND event_time >= NOW() - INTERVAL 1 DAY
ORDER BY event_time DESC;
system.compute.warehouse_events tracks warehouse start, stop, scale up and scale down events
https://docs.databricks.com/aws/en/compute/sql-warehouse/monitor/queries
If this answer resolves your question, could you please mark it as āAccept as Solutionā? It will help other users quickly find the correct fix.
Senior BI/Data Engineer | Microsoft MVP Data Platform | Microsoft MVP Power BI | Power BI Super User | C# Corner MVP