Databricks Community

xx123 · ‎06-25-2025

Hey,

I would like to compare the runtime of one specific query by running it on Databricks Serverless Warehouse and Snowflake Virtual Warehouse.

I create table with the exact same structure with the exact same dataset in both Warehouses.

the dataset if self is quite simple, it has a id column (int), name column (string) and an array<double> with 5000 elements. The table has around 1.5M rows, ~40GB in size.

I want to run very simple query to compare the runtimes, but i need to make sure the entire table is scanned.

Query is as simple as `select * from table`. It works in Snowflake, but I cannot return all results in Databricks Warehouse. Even when i choose Download results, it only retreives part of them.

I tried to measure it by running CTAS and INSERT into a separate table, but it takes just a few seconds so there the result won't help me.

The reason why i choose this method is because we have also other engines where we executed the exact same queries, and all of them yielded the result. I would like to avoid using predicates to narrow down the results.

I also tried using Databricks Spark Cluster, and it worked fine.

Any ideas how to tackle this? Thanks!

Krishna_S · ‎10-16-2025

You’re running into a Databricks SQL results delivery limit—the UI (and even “Download results”) isn’t meant to stream 1.5M × (id, name, 5,000-double array) back to your browser. That’s why SELECT * “works” on Snowflake’s console but not in the DBSQL UI. So don’t measure by returning the whole table.

Here’s how to do a fair, full-scan runtime comparison without changing the data or adding predicates:

Force a full scan but return 1 row

Make the engine read every column and array element and reduce them to a checksum. This avoids UI limits while still measuring scan/compute.

Databricks SQL

-- Optional: avoid cached results
SET use_cached_result = false;

SELECT
  SUM(COALESCE(CAST(id AS BIGINT), 0))                                        AS s_id,
  SUM(COALESCE(LENGTH(name), 0))                                              AS s_name_len,
  -- read every element of the 5k-length array<double>
  SUM(AGGREGATE(arr, CAST(0.0 AS DOUBLE), (acc, x) -> acc + COALESCE(x, 0.0))) AS s_arr_sum
FROM your_catalog.your_schema.your_table;

Do the same SQL in snowflake as well to compare
This forces a full table scan, column decode, and array traversal on both engines.
Measure server-side runtime from each platform’s query history/profile (Databricks: Query Profile / query history; Snowflake: Query History). You’ll also see bytes read / rows scanned to verify it wasn’t a metadata shortcut.

View solution in original post

Krishna_S · ‎10-16-2025

You’re running into a Databricks SQL results delivery limit—the UI (and even “Download results”) isn’t meant to stream 1.5M × (id, name, 5,000-double array) back to your browser. That’s why SELECT * “works” on Snowflake’s console but not in the DBSQL UI. So don’t measure by returning the whole table.

Here’s how to do a fair, full-scan runtime comparison without changing the data or adding predicates:

Force a full scan but return 1 row

Make the engine read every column and array element and reduce them to a checksum. This avoids UI limits while still measuring scan/compute.

Databricks SQL

-- Optional: avoid cached results
SET use_cached_result = false;

SELECT
  SUM(COALESCE(CAST(id AS BIGINT), 0))                                        AS s_id,
  SUM(COALESCE(LENGTH(name), 0))                                              AS s_name_len,
  -- read every element of the 5k-length array<double>
  SUM(AGGREGATE(arr, CAST(0.0 AS DOUBLE), (acc, x) -> acc + COALESCE(x, 0.0))) AS s_arr_sum
FROM your_catalog.your_schema.your_table;

Do the same SQL in snowflake as well to compare
This forces a full table scan, column decode, and array traversal on both engines.
Measure server-side runtime from each platform’s query history/profile (Databricks: Query Profile / query history; Snowflake: Query History). You’ll also see bytes read / rows scanned to verify it wasn’t a metadata shortcut.