cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Pyspark operations slowness in CLuster 14.3LTS as compared to 13.3 LTS

anish2102
New Contributor III

In my notebook, i am performing few join operations which are taking more than 30s in cluster 14.3 LTS where same operation is taking less than 4s in 13.3 LTS cluster. Can someone help me how can i optimize pyspark operations like joins and withColumn?

1 ACCEPTED SOLUTION

Accepted Solutions

anish2102
New Contributor III

I have found the issue. It was actually with code where  dataframe was being referred multiple times in withcolumn and join operations in form dataframe['col_name'] which is creating more than 20 spark jobs and hence causing degradation in performance of notebook. If i refer column using col() function in both join and withcolumn hen it is working fast compared to previous one. Also it is crating 1 or 2 spark job only.

View solution in original post

4 REPLIES 4

jose_gonzalez
Moderator
Moderator

check the physical query plan for both, DBR 14.3 and 13.3 to compare if these values are different. If they are, then check the Spark UI to identify where did it changed

Lakshay
Esteemed Contributor

Are you comparing the performance against same dataset?

anish2102
New Contributor III

I have found the issue. It was actually with code where  dataframe was being referred multiple times in withcolumn and join operations in form dataframe['col_name'] which is creating more than 20 spark jobs and hence causing degradation in performance of notebook. If i refer column using col() function in both join and withcolumn hen it is working fast compared to previous one. Also it is crating 1 or 2 spark job only.

Lakshay
Esteemed Contributor

Thank you for sharing the analysis

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group