How to fix "WARNING mlflow.utils.environment" when run mlflow in Databricks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2022 07:23 AM
I'm running the following python code from one of the databricks training materials.
import mlflow
import mlflow.spark
from import LinearRegression
from import VectorAssembler
from import Pipeline
from import RegressionEvaluator
with mlflow.start_run(run_name="LR-Single-Feature") as run:
# Define pipeline
vec_assembler = VectorAssembler(inputCols=["bedrooms"], outputCol="features")
lr = LinearRegression(featuresCol="features", labelCol="price")
pipeline = Pipeline(stages=[vec_assembler, lr])
pipeline_model =
# Log parameters
mlflow.log_param("label", "price")
mlflow.log_param("features", "bedrooms")
# Log model
mlflow.spark.log_model(pipeline_model, "model", input_example=train_df.limit(5).toPandas())
The last line of code "mlflow.spark.log_model(pipeline_model, "model", input_example=train_df.limit(5).toPandas()) " caused the following warning.
WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /tmp/tmpchgj6je8, flavor: spark), fall back to return ['pyspark==3.3.0']. Set logging level to DEBUG to see the full traceback.
Can anyone help with the cause of this and method to fix it? Thanks very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-21-2022 06:18 AM
If you are trying to log a model, could you please try passing the sparknlp requirements into the extra_pip_requirements argument?
Please let us know if that helps?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-21-2022 10:04 AM
Thank you debayan!
How do I pass sparknlp requirements to extra_pip_requirements argument? Could you please send sample code? Thank you very much!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-21-2022 10:11 AM
Also, I'm not using sparknlp, I'm just doing a simple linear regression. Thank you!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-06-2023 07:27 AM
I've encountered the same warning when running this notebook from DA.
I've managed to get rid of that warning by explicitly defining the argument `conda_env=mlflow.spark.get_default_conda_env()` in `mlflow.spark.log_model()`
The documentation reads
conda_env – Either a dictionary representation of a Conda environment or the path to a Conda environment yaml file. If provided, this decsribes the environment this model should be run in. At minimum, it should specify the dependencies contained in get_default_conda_env(). If None, the default get_default_conda_env() environment is added to the model.
So I'm not sure why my solution works.
The doc also reads
The following arguments can’t be specified at the same time:
- conda_env
- pip_requirements
- extra_pip_requirements
So I wonder if there is special inference rule when all three are None by default.