Re: Different behavior on personal cluster vs job ...

FRB1984 · ‎08-20-2025

Hi guys!
I am facing a weird bug here!
I own a notebook that runs perfectly on personal cluster. Just as example, I´ve made some prints of the data output during the extraction :

code :

cursor.execute(sql) 
results = cursor.fetchall() 
cols = [desc[0] for desc in cursor.description] 
dfspark = spark.createDataFrame(results, cols)

Output in personal cluster:

<class 'pyspark.sql.connect.dataframe.DataFrame'>

As you can see, when running in job cluster, the data is not being converted to da spark dataframe (and being held as pyspark.sql.connect.dataframe.DataFrame).

Vidhi_Khaitan · ‎08-20-2025

Hi team,

In interactive notebooks on personal clusters, you’re attached directly to the Spark driver inside the cluster. Spark session is the legacy PySpark session.
In job clusters, especially when running with newer runtimes (e.g. DBR 14.x+ or SQL warehouses), Databricks may automatically use Spark Connect. In this case, your client (pyspark.sql.connect) holds the DataFrame object, and operations get lazily pushed to the remote Spark cluster.

Different behavior on personal cluster vs job cluster