Databricks Community

Vasu_Kumar_T · ‎05-27-2025

Hello All,

One job taking more than 7hrs, when we added below configuration its taking <2:30 mins but after deployment with same parameters again its taking 7+hrs.

1) spark.conf.set("spark.sql.shuffle.partitions", 500) --> spark.conf.set("spark.sql.shuffle.partitions", 20000)
2) spark.catalog.clearCache()
for (id, rdd) in spark.sparkContext._jsc.getPersistentRDDs().items():
rdd.unpersist()
print("Unpersisted {} rdd".format(id))

3) DF = DF.withColumn('salt', F.rand())
DF = DF.repartition(100, 'salt')

Tried with fixed 20 nodes still taking 7+ hrs after deployment(no change in notebook and cluster configuration)

Before deployment 1:20(Auto scaling) also taking <2:30 mins

Any suggesstions are appriciated. Thanks

Vasu

lingareddy_Alva · ‎05-27-2025

Hi @Vasu_Kumar_T

This is a classic Spark performance inconsistency issue. The fact that it works fine in your notebook
but degrades after deployment suggests several potential causes. Here are the most likely culprits and solutions:

Primary Suspects
1. Data Skew Variations
Your salt-based repartitioning might not be consistently effective
if the underlying data distribution changes between runs or environments.

2. Cluster Resource Allocation
Fixed 20 nodes doesn't guarantee same resource allocation as auto-scaling.

3. Memory and Executor Configuration

Environment-Specific Solutions
Check these deployment differences:
- Spark version consistency between notebook and deployment
- Network bandwidth between nodes in production vs. development
- Storage type (SSD vs. HDD) and I/O throughput

LR

Databricks Community

Job performance issue : Configurations

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples