Hi @Dribka @William_Scardua
import numpy
actual_size_of_each_columns=df.toPandas().memory_usage(deep=True).to_dict()
del actual_size_of_each_columns["Index"]
for key in actual_size_of_each_columns:
print(f"Size of the Column `{key}` -> {actual_size_of_each_columns[key]} bytes")
print(f"\nSize of the DataFrame -> {numpy.sum(list(actual_size_of_each_columns.values()))} bytes")
This code can help you to find the actual size of each column and the DataFrame in memory. The output reflects the maximum memory usage, considering Spark's internal optimizations.