cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to fix "WARNING mlflow.utils.environment" when run mlflow in Databricks?

jsu999
New Contributor II

I'm running the following python code from one of the databricks training materials.

import mlflow
import mlflow.spark
from pyspark.ml.regression import LinearRegression
from pyspark.ml.feature import VectorAssembler
from pyspark.ml import Pipeline
from pyspark.ml.evaluation import RegressionEvaluator
 
with mlflow.start_run(run_name="LR-Single-Feature") as run:
    # Define pipeline
    vec_assembler = VectorAssembler(inputCols=["bedrooms"], outputCol="features")
    lr = LinearRegression(featuresCol="features", labelCol="price")
    pipeline = Pipeline(stages=[vec_assembler, lr])
    pipeline_model = pipeline.fit(train_df)
    
    # Log parameters
    mlflow.log_param("label", "price")
    mlflow.log_param("features", "bedrooms")
 
    # Log model
    mlflow.spark.log_model(pipeline_model, "model", input_example=train_df.limit(5).toPandas()) 

The last line of code "mlflow.spark.log_model(pipeline_model, "model", input_example=train_df.limit(5).toPandas()) " caused the following warning.

WARNING mlflow.utils.environment: Encountered an unexpected error while inferring pip requirements (model URI: /tmp/tmpchgj6je8, flavor: spark), fall back to return ['pyspark==3.3.0']. Set logging level to DEBUG to see the full traceback.

Can anyone help with the cause of this and method to fix it? Thanks very much!

4 REPLIES 4

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi,

If you are trying to log a model, could you please try passing the sparknlp requirements into the extra_pip_requirements argument?

Please let us know if that helps?

jsu999
New Contributor II

Thank you debayan!

How do I pass sparknlp requirements to extra_pip_requirements argument? Could you please send sample code? Thank you very much!

jsu999
New Contributor II

Also, I'm not using sparknlp, I'm just doing a simple linear regression. Thank you!

Fed
New Contributor III

I've encountered the same warning when running this notebook from DA.

https://github.com/databricks-academy/scalable-machine-learning-with-apache-spark-english/blob/publi...

I've managed to get rid of that warning by explicitly defining the argument `conda_env=mlflow.spark.get_default_conda_env()` in `mlflow.spark.log_model()`

The documentation reads

conda_env – Either a dictionary representation of a Conda environment or the path to a Conda environment yaml file. If provided, this decsribes the environment this model should be run in. At minimum, it should specify the dependencies contained in get_default_conda_env(). If None, the default get_default_conda_env() environment is added to the model. 

So I'm not sure why my solution works.

The doc also reads

The following arguments can’t be specified at the same time:

  • conda_env
  • pip_requirements
  • extra_pip_requirements

So I wonder if there is special inference rule when all three are None by default.

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!