<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Comparing Databricks Serverless Warehouse with Snowflake Virtual Warehouse for specific query in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/comparing-databricks-serverless-warehouse-with-snowflake-virtual/m-p/135176#M50294</link>
    <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="0" data-end="311"&gt;You’re running into a Databricks SQL results delivery limit—the UI (and even “Download results”) isn’t meant to stream &lt;STRONG data-start="123" data-end="164"&gt;1.5M × (id, name, 5,000-double array)&lt;/STRONG&gt; back to your browser. That’s why &lt;CODE data-start="198" data-end="208"&gt;SELECT *&lt;/CODE&gt; “works” on Snowflake’s console but not in the DBSQL UI. So don’t measure by returning the whole table.&lt;/P&gt;
&lt;P data-start="313" data-end="414"&gt;Here’s how to do a fair, full-scan runtime comparison without changing the data or adding predicates:&lt;/P&gt;
&lt;H1 data-start="416" data-end="467"&gt;Force a full scan but return &lt;STRONG data-start="458" data-end="467"&gt;1 row&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P data-start="468" data-end="615"&gt;Make the engine read &lt;STRONG data-start="489" data-end="529"&gt;every column and&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;array element&lt;/STRONG&gt;&amp;nbsp;and reduce them&lt;/SPAN&gt;&amp;nbsp;to a checksum. This avoids UI limits while still measuring scan/compute.&lt;/P&gt;
&lt;P data-start="617" data-end="658"&gt;&lt;STRONG data-start="617" data-end="658"&gt;Databricks SQL&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;-- Optional: avoid cached results
SET use_cached_result = false;

SELECT
  SUM(COALESCE(CAST(id AS BIGINT), 0))                                        AS s_id,
  SUM(COALESCE(LENGTH(name), 0))                                              AS s_name_len,
  -- read every element of the 5k-length array&amp;lt;double&amp;gt;
  SUM(AGGREGATE(arr, CAST(0.0 AS DOUBLE), (acc, x) -&amp;gt; acc + COALESCE(x, 0.0))) AS s_arr_sum
FROM your_catalog.your_schema.your_table;
&lt;/LI-CODE&gt;
&lt;UL data-start="1468" data-end="1791"&gt;
&lt;LI data-start="1468" data-end="1556"&gt;Do the same SQL in snowflake as well to compare&lt;/LI&gt;
&lt;LI data-start="1468" data-end="1556"&gt;
&lt;P data-start="1470" data-end="1556"&gt;This &lt;STRONG data-start="1475" data-end="1503"&gt;forces a full table scan&lt;/STRONG&gt;, column decode, and array traversal on both engines.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI data-start="1557" data-end="1791"&gt;
&lt;P data-start="1559" data-end="1791"&gt;Measure &lt;STRONG data-start="1567" data-end="1590"&gt;server-side runtime&lt;/STRONG&gt; from each platform’s query history/profile (Databricks: Query Profile / query history; Snowflake: Query History). You’ll also see &lt;STRONG data-start="1721" data-end="1750"&gt;bytes read / rows scanned&lt;/STRONG&gt; to verify it wasn’t a metadata shortcut.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
    <pubDate>Fri, 17 Oct 2025 00:00:08 GMT</pubDate>
    <dc:creator>Krishna_S</dc:creator>
    <dc:date>2025-10-17T00:00:08Z</dc:date>
    <item>
      <title>Comparing Databricks Serverless Warehouse with Snowflake Virtual Warehouse for specific query</title>
      <link>https://community.databricks.com/t5/data-engineering/comparing-databricks-serverless-warehouse-with-snowflake-virtual/m-p/122886#M46895</link>
      <description>&lt;P&gt;Hey,&lt;/P&gt;&lt;P&gt;I would like to compare the runtime of one specific query by running it on Databricks Serverless Warehouse and Snowflake Virtual Warehouse.&lt;/P&gt;&lt;P&gt;I create table with the exact same structure with the exact same dataset in both Warehouses.&lt;/P&gt;&lt;P&gt;the dataset if self is quite simple, it has a id column (int), name column (string) and an array&amp;lt;double&amp;gt; with 5000 elements. The table has around 1.5M rows, ~40GB in size.&lt;/P&gt;&lt;P&gt;I want to run very simple query to compare the runtimes, but i need to make sure the entire table is scanned.&lt;/P&gt;&lt;P&gt;Query is as simple as `select * from table`. It works in Snowflake, but I cannot return all results in Databricks Warehouse. Even when i choose Download results, it only retreives part of them.&lt;/P&gt;&lt;P&gt;I tried to measure it by running CTAS and INSERT into a separate table, but it takes just a few seconds so there the result won't help me.&lt;/P&gt;&lt;P&gt;The reason why i choose this method is because we have also other engines where we executed the exact same queries, and all of them yielded the result. I would like to avoid using predicates to narrow down the results.&lt;/P&gt;&lt;P&gt;I also tried using Databricks Spark Cluster, and it worked fine.&lt;/P&gt;&lt;P&gt;Any ideas how to tackle this? Thanks!&lt;/P&gt;</description>
      <pubDate>Wed, 25 Jun 2025 20:10:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/comparing-databricks-serverless-warehouse-with-snowflake-virtual/m-p/122886#M46895</guid>
      <dc:creator>xx123</dc:creator>
      <dc:date>2025-06-25T20:10:57Z</dc:date>
    </item>
    <item>
      <title>Re: Comparing Databricks Serverless Warehouse with Snowflake Virtual Warehouse for specific query</title>
      <link>https://community.databricks.com/t5/data-engineering/comparing-databricks-serverless-warehouse-with-snowflake-virtual/m-p/135176#M50294</link>
      <description>&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P data-start="0" data-end="311"&gt;You’re running into a Databricks SQL results delivery limit—the UI (and even “Download results”) isn’t meant to stream &lt;STRONG data-start="123" data-end="164"&gt;1.5M × (id, name, 5,000-double array)&lt;/STRONG&gt; back to your browser. That’s why &lt;CODE data-start="198" data-end="208"&gt;SELECT *&lt;/CODE&gt; “works” on Snowflake’s console but not in the DBSQL UI. So don’t measure by returning the whole table.&lt;/P&gt;
&lt;P data-start="313" data-end="414"&gt;Here’s how to do a fair, full-scan runtime comparison without changing the data or adding predicates:&lt;/P&gt;
&lt;H1 data-start="416" data-end="467"&gt;Force a full scan but return &lt;STRONG data-start="458" data-end="467"&gt;1 row&lt;/STRONG&gt;&lt;/H1&gt;
&lt;P data-start="468" data-end="615"&gt;Make the engine read &lt;STRONG data-start="489" data-end="529"&gt;every column and&amp;nbsp;&lt;/STRONG&gt;&lt;SPAN&gt;&lt;STRONG&gt;array element&lt;/STRONG&gt;&amp;nbsp;and reduce them&lt;/SPAN&gt;&amp;nbsp;to a checksum. This avoids UI limits while still measuring scan/compute.&lt;/P&gt;
&lt;P data-start="617" data-end="658"&gt;&lt;STRONG data-start="617" data-end="658"&gt;Databricks SQL&amp;nbsp;&lt;/STRONG&gt;&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;-- Optional: avoid cached results
SET use_cached_result = false;

SELECT
  SUM(COALESCE(CAST(id AS BIGINT), 0))                                        AS s_id,
  SUM(COALESCE(LENGTH(name), 0))                                              AS s_name_len,
  -- read every element of the 5k-length array&amp;lt;double&amp;gt;
  SUM(AGGREGATE(arr, CAST(0.0 AS DOUBLE), (acc, x) -&amp;gt; acc + COALESCE(x, 0.0))) AS s_arr_sum
FROM your_catalog.your_schema.your_table;
&lt;/LI-CODE&gt;
&lt;UL data-start="1468" data-end="1791"&gt;
&lt;LI data-start="1468" data-end="1556"&gt;Do the same SQL in snowflake as well to compare&lt;/LI&gt;
&lt;LI data-start="1468" data-end="1556"&gt;
&lt;P data-start="1470" data-end="1556"&gt;This &lt;STRONG data-start="1475" data-end="1503"&gt;forces a full table scan&lt;/STRONG&gt;, column decode, and array traversal on both engines.&lt;/P&gt;
&lt;/LI&gt;
&lt;LI data-start="1557" data-end="1791"&gt;
&lt;P data-start="1559" data-end="1791"&gt;Measure &lt;STRONG data-start="1567" data-end="1590"&gt;server-side runtime&lt;/STRONG&gt; from each platform’s query history/profile (Databricks: Query Profile / query history; Snowflake: Query History). You’ll also see &lt;STRONG data-start="1721" data-end="1750"&gt;bytes read / rows scanned&lt;/STRONG&gt; to verify it wasn’t a metadata shortcut.&lt;/P&gt;
&lt;/LI&gt;
&lt;/UL&gt;</description>
      <pubDate>Fri, 17 Oct 2025 00:00:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/comparing-databricks-serverless-warehouse-with-snowflake-virtual/m-p/135176#M50294</guid>
      <dc:creator>Krishna_S</dc:creator>
      <dc:date>2025-10-17T00:00:08Z</dc:date>
    </item>
  </channel>
</rss>

