cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

workflow fails when ran using a job cluster but not in shared cluster

ShaliniC
New Contributor II

Hi,

We have a workflow which calls 3 notebooks and when we run this workflow using shared cluster it runs fine, but when ran with job cluster , one of the notebooks fail.This notebook uses sql function Lpad and looks like it errors because of it. Has anyone experienced anything similar.

Thanks,

Shalini

4 REPLIES 4

filipniziol
Contributor III

Hi @ShaliniC ,

The possible causes are:
1. The Databricks Runtime version of the job cluster is different than the one of the the shared cluster.

2. The libraries installed on the shared cluster are missing in the job cluster.

Could you check the what are the above configuration changes and adjust the job cluster to the configuration of the shared cluster?

If it does not help, could you please share the details of the error message?

Hi,

The databricks runtime version and the installed libraries are the same on shared and job cluster. We had also raised a case with databricks support for the same. where they identified the error to be a syntax error in lpad function which we have used. 

Here is the explanation for reason for difference between clusters , which we have received from databricks

Single access mode compute uses Classic Pyspark and Shared clusters use the secure SparkConnect interface for python API, So likely the behaviour is inconsistent between SparkConnect and Classic PySpark.

Thanks,Shalini

saurabh18cs
Contributor II

notebooks are executing sequentially or parallel in this workflow?

Hi,

the error occured in the parallel run.

We had also raised a case with databricks support for the same. where they identified the error to be a syntax fail in lpad function which we have used. 

Here is the explanation for reason for difference between clusters , which we have received from databricks

Single access mode compute uses Classic Pyspark and Shared clusters use the secure SparkConnect interface for python API, So likely the behaviour is inconsistent between SparkConnect and Classic PySpark.

Thanks,Shalini

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group