Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-12-2025 04:11 AM
Hi @alexbarev ,
The slowdown is likely due to using Python UDFs on a Shared (Standard) access mode cluster with Unity Catalog, which adds extra security and isolation overhead. Using a Dedicated access mode cluster removes the extra isolation overhead from Unity Catalog, which typically resolves the UDF performance issues.
To further improve performance:
- Enable spark.sql.execution.pythonUDF.arrow.enabled = true in cluster settings.
- Check the Spark UI for task delays or scheduler bottlenecks related to UDFs.
- Review job logs for high serialization/deserialization times.