- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2025 12:34 PM
Hi everyone,
I'm building a Pyspark ML Pipeline where the first stage is to fill nulls with zero. I wrote a custom class to do this since I cannot find a Transformer that will do this imputation.
I am able to log this pipeline using ML Flow log model method and load it for scoring but when I log it with the Feature Engineering package, the score batch method throws an error saying that the custom class does not exist. I need to log it via the Feature Engineering package so I can properly leverage featurestores and the lineage in unity catalog. Is anyone able to help? The sample pipeline code is below. inputs are loaded using feature lookups and the "create training set" method
All assistance is appreciated!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2025 12:19 AM
Hi @WarrenO , thanks for sharing that with the detailed code!
I was able to reproduce the error, specifically the following error:
AttributeError: module '__main__' has no attribute 'CustomAdder'
File <command-1315887242804075>, line 39
35 evaluator = RegressionEvaluator(
36 labelCol="alcohol", predictionCol="prediction")
38 # Log metrics
---> 39 rmse = evaluator.evaluate(predictions, {evaluator.metricName: "rmse"})
41 # Log model metrics
42 mlflow.log_metric("root_mean_squared_error", rmse)
I did some research internally and found that the similar issue has been reported and confirmed that custom classes are not supported with feature store score_batch currently unfortunately. The reason is FeatureEngineeringClient score_batch execute the transform using remote UDFs but workers cannot load the custom class definitions there. And there's no way to manually specify additional dependencies with FeatureEngineeringClient's log_model. We need something like PyFunc flavor's additional code_path parameter, but it's not available here.
I will share with the product team that this feature is demanded to implement end-to-end feature management. I hope it can make a difference. Thanks again for reporting!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-07-2025 12:19 AM
Hi @WarrenO , thanks for sharing that with the detailed code!
I was able to reproduce the error, specifically the following error:
AttributeError: module '__main__' has no attribute 'CustomAdder'
File <command-1315887242804075>, line 39
35 evaluator = RegressionEvaluator(
36 labelCol="alcohol", predictionCol="prediction")
38 # Log metrics
---> 39 rmse = evaluator.evaluate(predictions, {evaluator.metricName: "rmse"})
41 # Log model metrics
42 mlflow.log_metric("root_mean_squared_error", rmse)
I did some research internally and found that the similar issue has been reported and confirmed that custom classes are not supported with feature store score_batch currently unfortunately. The reason is FeatureEngineeringClient score_batch execute the transform using remote UDFs but workers cannot load the custom class definitions there. And there's no way to manually specify additional dependencies with FeatureEngineeringClient's log_model. We need something like PyFunc flavor's additional code_path parameter, but it's not available here.
I will share with the product team that this feature is demanded to implement end-to-end feature management. I hope it can make a difference. Thanks again for reporting!

