Databricks Community

Dimitry · ‎06-17-2025

Hi all

I have 2 clusters, that look identical but one runs my UDF in parallel another one does not.

The ones that do is personal, the bad one is shared.

import pandas as pd
from datetime import datetime
from time import sleep
import threading

# test function
def func(x: pd.DataFrame):
    sleep(1)
    return pd.DataFrame({'id': x['id'], 'timestamp': str(datetime.now()), 'thread': threading.get_native_id()})

# native
sdf = spark.range(start=0, end=40, step=1, numPartitions=8)

now = datetime.now()
sdf = sdf.groupby('id').applyInPandas(func, schema="id int, timestamp string, thread int")
result = spark.createDataFrame(sdf.toPandas()) # trigger lazy evaluation
print((datetime.now() - now).total_seconds())

display(result.groupBy("thread").count())

Personal cluster splits into 4 threads (as CPUs) but the shared one doesn't

This is personal vs shared clusters configuration, I don't get what is making them to work differently.

Note that in the real code I'm using repartition to achieve the same effect and it also works on the personal cluster but not on the shared.

Please help!!

_sqldf.repartition(max_number_of_threads, "batch_id").groupBy("batch_id").applyInPandas(..)

Dimitry · ‎06-17-2025

I sort of fixed it myself. Screenshot above was incorrect for the shared compute.

and the fix was in changing the access mode

View solution in original post

Dimitry · ‎06-17-2025

I sort of fixed it myself. Screenshot above was incorrect for the shared compute.

and the fix was in changing the access mode

Dimitry · ‎06-17-2025

As a side note "no isolation shared" cluster has no access to unity catalog, so no table queries.

I resorted to using personal compute assigned to a group.

Databricks Community

Struggle to parallelize UDF

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐