SP_6721
Honored Contributor II

Hi @alexbarev ,

The slowdown is likely due to using Python UDFs on a Shared (Standard) access mode cluster with Unity Catalog, which adds extra security and isolation overhead. Using a Dedicated access mode cluster removes the extra isolation overhead from Unity Catalog, which typically resolves the UDF performance issues.

To further improve performance:

  • Enable spark.sql.execution.pythonUDF.arrow.enabled = true in cluster settings.
  • Check the Spark UI for task delays or scheduler bottlenecks related to UDFs.
  • Review job logs for high serialization/deserialization times.