StringIndexer method fails with shared compute
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2024 03:50 AM - edited 10-18-2024 03:53 AM
Dear Team
StringIndexer method of mlflow library upon running code on No Isolation Shared access mode data bricks cluster it works but it is failing on Unity catalog enabled data bricks cluster having Shared access mode.
here is the library name: from pyspark.ml.feature import StringIndexer.
Error is py4j.security. Py4JSecurityException: Constructor public org.apache.spark.ml.feature.StringIndexer(java.lang.String) is not whitelisted.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-18-2024 03:59 AM
Hi,
The issue you're encountering with the StringIndexer method from the MLflow library failing on a Unity Catalog-enabled Databricks cluster with Shared access mode is likely due to the limitations associated with Shared access mode in Unity Catalog
Shared Access Mode Limitations on Unity Catalog:
- Databricks Runtime ML and Spark Machine Learning Library (MLlib) are not supported in Shared access mode on Unity Catalog. This limitation could directly impact the functionality of the StringIndexer method, which is part of the Spark MLlib.
- Spark-submit jobs are not supported in Shared access mode on Unity Catalog.
- PySpark UDFs cannot access Git folders, workspace files, or volumes to import modules in Databricks Runtime 14.2 and below.
- DBFS root and mounts do not support FUSE in Shared access mode.
For more understanding check: https://docs.databricks.com/en/compute/access-mode-limitations.html

