Databricks Community

subhas_hati · ‎01-21-2025

Hi

I have chosen the default partition size 128 MB. I am reading a 3.8 GB file and checking the size of partition using df.rdd.getNumPartitions() as given below. I find the partition size: 159 MB.

Why the partition size after reading the file differ ?

# Check the default partition size
partition_size = spark.conf.get("spark.sql.files.maxPartitionBytes").replace("b","")
print(f"Partition Size: {partition_size} in bytes and {int(partition_size) / 1024 / 1024} in MB")

partition_size = (file_size)/1024/1024/(df.rdd.getNumPartitions())

Sidhant07 · ‎01-30-2025

Hi @subhas_hati ,

The partition size of a 3.8 GB file read into a DataFrame differs from the default partition size of 128 MB, resulting in a partition size of 159 MB, due to the influence of the spark.sql.files.openCostInBytes configuration.• spark.sql.files.maxPartitionBytes: This setting specifies the maximum number of bytes to pack into a single partition when reading files. The default is 128 MB.

• spark.sql.files.openCostInBytes: This internal configuration estimates the cost to open a file, measured by the number of bytes that could be scanned simultaneously. Its default value is 4 MB and it is added as an overhead to the partition size calculation.

The partition size calculation involves adding the spark.sql.files.openCostInBytes overhead to the total file size, which can lead to larger partition sizes than the default spark.sql.files.maxPartitionBytes setting. This is why the observed partition size can be 159 MB instead of the expected 128 MB.Sources:
1. spark.sql.files.maxPartitionBytes
2. spark.sql.files.openCostInBytes

Databricks Community

Partition Size:

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog