How to cache on 500 billion rows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2024 06:04 AM - edited 03-26-2024 06:05 AM
Hello!
I'm using a server less SQL cluster on Data bricks and I have a dataset on Delta Table that has 500 billion rows. I'm trying to filter to have around 7 billion and the cache that dataset to use it on other queries and make it run faster.
When I cache the table it takes 1s and gives no error/warning.
When I select the cache table it gives and error that cannot be found.
This is what I'm doing:
CACHE TABLE table_filtered_cache AS select * from prod_datalake.table a
WHERE
a.year >= 2023 etc
and then
select count(*) from table_filtered_cache
What am I doing wrong, and what would you advise me to do?
- Labels:
-
Delta Lake
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2024 06:20 AM
can you try with creating a global temp view of the cache?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2024 06:37 AM
"GLOBAL TEMPORARY VIEW is not supported on a SQL warehouse."
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-26-2024 06:44 AM
I missed the 'serverless sql' part. CACHE is for spark, I don´t think it works for serverless sql.
Here is how caching works on DBSQL.

