Different behavior on personal cluster vs job cluster
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2025 09:58 AM
Hi guys!
I am facing a weird bug here!
I own a notebook that runs perfectly on personal cluster. Just as example, I´ve made some prints of the data output during the extraction :
code :
cursor.execute(sql)
results = cursor.fetchall()
cols = [desc[0] for desc in cursor.description]
dfspark = spark.createDataFrame(results, cols)Output in personal cluster:
<class 'pyspark.sql.connect.dataframe.DataFrame'>
As you can see, when running in job cluster, the data is not being converted to da spark dataframe (and being held as pyspark.sql.connect.dataframe.DataFrame).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-20-2025 09:55 PM
Hi team,
In interactive notebooks on personal clusters, you’re attached directly to the Spark driver inside the cluster. Spark session is the legacy PySpark session.
In job clusters, especially when running with newer runtimes (e.g. DBR 14.x+ or SQL warehouses), Databricks may automatically use Spark Connect. In this case, your client (pyspark.sql.connect) holds the DataFrame object, and operations get lazily pushed to the remote Spark cluster.