Hey,
I would like to compare the runtime of one specific query by running it on Databricks Serverless Warehouse and Snowflake Virtual Warehouse.
I create table with the exact same structure with the exact same dataset in both Warehouses.
the dataset if self is quite simple, it has a id column (int), name column (string) and an array<double> with 5000 elements. The table has around 1.5M rows, ~40GB in size.
I want to run very simple query to compare the runtimes, but i need to make sure the entire table is scanned.
Query is as simple as `select * from table`. It works in Snowflake, but I cannot return all results in Databricks Warehouse. Even when i choose Download results, it only retreives part of them.
I tried to measure it by running CTAS and INSERT into a separate table, but it takes just a few seconds so there the result won't help me.
The reason why i choose this method is because we have also other engines where we executed the exact same queries, and all of them yielded the result. I would like to avoid using predicates to narrow down the results.
I also tried using Databricks Spark Cluster, and it worked fine.
Any ideas how to tackle this? Thanks!