How to Get the Size of Filtered Rows in Databricks SQL
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2025 12:14 AM
I have a query that filters rows from a table based on a timestamp range. The query is as follows:
This query returns 10 rows. I need to calculate the total size (in bytes) of these 10 rows.
What is the best approach in Databricks SQL to get the exact size of these filtered rows?
is there a built-in function to get this information?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2025 06:37 AM
You'd have to write the results to a table, then do
DESCRIBE DETAIL table_name
and look for the column sizeInBytes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-06-2025 09:28 AM
@vidya_kothavale try this code block. Keep in mind to handle the null values.
SELECT
SUM(OCTET_LENGTH(CAST(column1 AS STRING)) + OCTET_LENGTH(CAST(column2 AS STRING)) + OCTET_LENGTH(CAST(COALESCE(column3, '0') AS STRING))) as bytes,
SUM(OCTET_LENGTH(CAST(column1 AS STRING)) + OCTET_LENGTH(CAST(column2 AS STRING)) + OCTET_LENGTH(CAST(COALESCE(column3, '0') AS STRING))) /POWER(1024,2) as mb,
SUM(OCTET_LENGTH(CAST(column1 AS STRING)) + OCTET_LENGTH(CAST(column2 AS STRING)) + OCTET_LENGTH(CAST(COALESCE(column3, '0') AS STRING))) /POWER(1024,3) as gb
FROM catalog.schema.table where ts >= '2025-02-04 00:00:00' AND ts < '2025-02-05 00:00:00';
Let me know for anything, else mark it as solution.

