Databricks Community

Sangeethagk · a week ago

Hi Team, one of my customer is facing the below issue.. Anyone faced this issue before ? Any help would be appreciated.

import mlflow

mlflow.set_registry_uri("databricks-uc")

catalog_name = "system"

embed = mlflow.pyfunc.spark_udf(spark, f"models:/system.ai.bge_m3/1", "array<float>")

On running the above piece of code, we are getting the below error

TypeError: ColSpec.__init__() got an unexpected keyword argument 'required'

WARNING mlflow.pyfunc: Detected one or more mismatches between the model's dependencies and the current Python environment: - mlflow (current: 2.7.1, required: mlflow==2.11.2) - torch (current: 2.0.1+cu118, required: torch==2.2.1) - transformers (current: 4.31.0, required: transformers==4.38.2) To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.

WARNING mlflow.pyfunc: Calling `spark_udf()` with `env_manager="local"` does not recreate the same environment that was used during training, which may lead to errors or inaccurate predictions. We recommend specifying `env_manager="conda"`, which automatically recreates the environment that was used to train the model and performs inference in the recreated environment.

Kaniz_Fatma · yesterday

Hi @Sangeethagk, It looks like you’re encountering a couple of issues related to mlflow.pyfunc.spark_udf() and model dependencies.

TypeError: ColSpec.init() got an unexpected keyword argument ‘required’:
- This error occurs when you’re using mlflow.pyfunc.spark_udf() with an unexpected argument.
- The issue might be related to the way you’re specifying the input columns for the UDF.
- To resolve this, consider checking the input arguments and ensure they match the expected format.
Model Dependencies Mismatch:
- The warning about model dependencies indicates that the current Python environment doesn’t match the environment in which the model was trained.
- To fix this, you can use mlflow.pyfunc.get_model_dependencies(model_uri) to fetch the model’s environment and install the required dependencies using the resulting environment file.
- Make sure your mlflow version matches the required version (2.11.2) and other dependencies are also aligned.
Environment Manager for spark_udf():
- The second warning suggests that using env_manager="local" with spark_udf() doesn’t recreate the same environment used during training.
- To avoid errors or inaccurate predictions, consider specifying env_manager="conda". This will automatically recreate the training environment for inference.

Remember to address these points, and your issue should be resolved. If you need further assistance, feel free to ask! 😊

Databricks Community

TypeError: ColSpec.init() got an unexpected keyword argument 'required'

Join Us for an Exciting Community Social Event!

Introducing Databricks LakeFlow: A unified, intelligent solution for data engineering

Open Sourcing Unity Catalog

Databricks Learning Festival (Virtual): 10 July - 24 July 2024