cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Different behavior on personal cluster vs job cluster

FRB1984
New Contributor II

Hi guys!
I am facing a weird bug here!
I own a notebook that runs perfectly on personal cluster. Just as example, Iยดve made some prints of the data output during the extraction :

code :

cursor.execute(sql) 
results = cursor.fetchall() 
cols = [desc[0] for desc in cursor.description] 
dfspark = spark.createDataFrame(results, cols)โ€‹

Output in personal cluster:

<class 'pyspark.sql.connect.dataframe.DataFrame'>


As you can see, when running in job cluster, the data is not being converted to da spark dataframe (and being held as pyspark.sql.connect.dataframe.DataFrame).

1 REPLY 1

Vidhi_Khaitan
Databricks Employee
Databricks Employee

Hi team,

In interactive notebooks on personal clusters, youโ€™re attached directly to the Spark driver inside the cluster. Spark session is the legacy PySpark session.
In job clusters, especially when running with newer runtimes (e.g. DBR 14.x+ or SQL warehouses), Databricks may automatically use Spark Connect. In this case, your client (pyspark.sql.connect) holds the DataFrame object, and operations get lazily pushed to the remote Spark cluster.