cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to estimate dataframe size in bytes ?

William_Scardua
Valued Contributor

How guys,

How do I estimate the size in bytes from my dataframe (pyspark) ?

Have any ideia ?

Thank you

3 REPLIES 3

Dribka
New Contributor III

@William_Scardua estimating the size of a PySpark DataFrame in bytes can be achieved using the dtypes and storageLevel attributes. First, you can retrieve the data types of the DataFrame using df.dtypes. Then, you can calculate the size of each column based on its data type. Multiply the number of elements in each column by the size of its data type and sum these values across all columns to get an estimate of the DataFrame size in bytes. Additionally, you can check the storage level of the DataFrame using df.storageLevel to understand if it's persisted in memory or on disk, as this can affect the actual storage size. Keep in mind that this is an estimation and the actual memory usage may vary based on factors like compression and optimization. If you need a more precise measurement, consider using the pyspark.sql.functions library to calculate the size of individual columns and the overall DataFrame size.

BroData
New Contributor II

Hi @Dribka @William_Scardua 

import numpy
actual_size_of_each_columns=df.toPandas().memory_usage(deep=True).to_dict()
del actual_size_of_each_columns["Index"]
for key in actual_size_of_each_columns:
    print(f"Size of the Column `{key}` -> {actual_size_of_each_columns[key]} bytes")
print(f"\nSize of the DataFrame -> {numpy.sum(list(actual_size_of_each_columns.values()))} bytes")

This code can help you to find the actual size of each column and the DataFrame in memory. The output reflects the maximum memory usage, considering Spark's internal optimizations.

Enneagram1w2
New Contributor II

Unveil the Enneagram 1w9 mix: merging Type 1's perfectionism with Type 9's calm. Explore their key traits, hurdles, and development path.  https://www.enneagramzoom.com/EnneagramTypes/EnneagramType1/Enneagram1w2

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!