Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
Showing results for 
Search instead for 
Did you mean: 

Py4JError: An error occurred while calling o992.resourceProfileManager

New Contributor III


I am trying to run the SparkXGBoostRegressor and I am getting the following error:

Py4JError: An error occurred while calling o992.resourceProfileManager. Trace: Method public org.apache.spark.resource.ResourceProfileManager org.apache.spark.SparkContext.resourceProfileManager() is not whitelisted on class class org.apache.spark.SparkContext at at py4j.Gateway.invoke( at py4j.commands.AbstractCommand.invokeMethod( at py4j.commands.CallCommand.execute( at py4j.ClientServerConnection.waitForCommands( at at

Here is my custom model class and the code I am running:

class SparkClassificationModelManual(MLflowModelSignatureMixin, SparkSerializationMixin)
    def __init__(self, inputCols, outputCol)
        self.inputCols = inputCols
        self.outputCol = outputCol
        self.featuresCol = "features"
        self.scaledFeaturesCol = "scaledFeatures"
        self.path = None  # Compulsory
        #self._model = LogisticRegression(featuresCol="scaledFeatures", labelCol=self.outputCol)
        self._model = SparkXGBRegressor(features_col="scaledFeatures", label_col=self.outputCol)   
        self._scaler = StandardScaler(inputCol=self.featuresCol, outputCol=self.scaledFeaturesCol, withStd=True, withMean=False)
        self.assembler = VectorAssembler(inputCols=self.inputCols, outputCol=self.featuresCol)
    def fit(self, df: pyspark.sql.DataFrame) -> None:
        # Combine feature columns into a single vector column
        assembled_df = self.assembler.transform(df)
        # Scale the features
        self._scaler =
        scaled_df = self._scaler.transform(assembled_df)
        # Fit the logistic regression model
    def save(self, path)
        self.path = str(Path(path).parent)
    def load(self, path)
        self.path = str(Path(path).parent)
        return super().load(path)
    def predict(self, test_df)
        # Assuming the model has been fitted and the same transformations are applied to test data
        assembled_test_df = self.assembler.transform(test_df)
        scaled_test_df = self._scaler.transform(assembled_test_df)
        predictions = self._model.predict(scaled_test_df)
        return predictions
if __name__== '__main__':
    spark = SparkSession.builder.appName("ExampleApp").getOrCreate()
    model = SparkClassificationModelManual(inputCols=["feature1", "feature2", "feature3"], outputCol="label")
    mlflow_reg = MLflowModelRegistration(
        model_reg_tags={"testing": "spark"}
    data = spark.createDataFrame([
        (0, 0.1, 0.3, 1),
        (1, 0.2, 0.5, 0),
        (0, 0.5, 0.8, 1),
        (1, 0.3, 0.7, 0)
    ], ["feature1", "feature2", "feature3", "label"])
    with mlflow_reg:

    model = load_model(SparkClassificationModelManual, name="spark_testing_preprocess", version=2)
    res= model.predict(data)

Community Manager
Community Manager

Hi @rahuja, The error you’re encountering might be related to the interaction between PySpark and XGBoost.

Let’s explore some potential solutions:

  1. PySpark Version Compatibility:

  2. Check Cluster Logs:

  3. Debugging UDFs:

  4. XGBoost Parameters:

    • Ensure that you’re setting the correct parameters for the SparkXGBRegressor. Some parameters, such as nthread, are forbidden in the Spark estimator. Refer to the SparkXGBRegressor documentation for details on supported parameters45.
    • Also, make sure you’re specifying the correct features column (features_col) and label column (label_col) when creating the SparkXGBRegressor.
  5. Cluster Configuration:

    • Check if your cluster configuration (e.g., resource allocation, memory, cores) is sufficient for running the XGBoost training. Adjust the cluster settings if necessary.

If you encounter any specific error messages or need further assistance, feel free to share them, and I’ll be happy to help! 😊

New Contributor III

Hello Kaniz

I am currently using:

  • pyspark: 3.5.0 which is default in Spark ML 14.3LTS runtime
  • xgboost : 1.7.6

I have also checked the driver logs and there seems to be no problems because of some UDF(S). Anything else that can be tried?

I checked the code works perfectly fine with a single node cluster but somehow throws this error Multinode cluster. Here are the configurations of two clusters:

1. Single Node cluster

  • Data Bricks runtime version: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)
  • Node Type: Standard_D4ds_v5

The code runs perfectly fine in this one.

2. Multi Node Interactive cluster

  • Data Bricks runtime version: 14.3 LTS ML (includes Apache Spark 3.5.0, Scala 2.12)
  • Node Type: Standard_D4ds_v5
  • Min Workers: 1
  • Max Workers 3

How is this happening that two clusters with same runtime and library version but one runs perfectly fine but other throws this error?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!