cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

StringIndexer method fails with shared compute

sandeephenkel23
New Contributor III

Dear Team

StringIndexer method of mlflow library upon running code on No Isolation Shared access mode data bricks cluster it works but it is failing on Unity catalog enabled data bricks cluster having Shared access mode.

here is the library name: from pyspark.ml.feature import StringIndexer.

Error is py4j.security. Py4JSecurityException: Constructor public org.apache.spark.ml.feature.StringIndexer(java.lang.String) is not whitelisted.

1 REPLY 1

shashank853
Databricks Employee
Databricks Employee

Hi,

The issue you're encountering with the StringIndexer method from the MLflow library failing on a Unity Catalog-enabled Databricks cluster with Shared access mode is likely due to the limitations associated with Shared access mode in Unity Catalog

Shared Access Mode Limitations on Unity Catalog:

- Databricks Runtime ML and Spark Machine Learning Library (MLlib) are not supported in Shared access mode on Unity Catalog. This limitation could directly impact the functionality of the StringIndexer method, which is part of the Spark MLlib.
- Spark-submit jobs are not supported in Shared access mode on Unity Catalog.
- PySpark UDFs cannot access Git folders, workspace files, or volumes to import modules in Databricks Runtime 14.2 and below.
- DBFS root and mounts do not support FUSE in Shared access mode.

For more understanding check: https://docs.databricks.com/en/compute/access-mode-limitations.html

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now