Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Showing results for 
Search instead for 
Did you mean: 

TypeError: ColSpec.__init__() got an unexpected keyword argument 'required'

New Contributor

Hi Team, one of my customer is facing the below issue.. Anyone faced this issue before ? Any help would be appreciated.

import mlflow


catalog_name = "system"

embed = mlflow.pyfunc.spark_udf(spark, f"models:/", "array<float>")

On running the above piece of code, we are getting the below error

TypeError: ColSpec.__init__() got an unexpected keyword argument 'required'

WARNING mlflow.pyfunc: Detected one or more mismatches between the model's dependencies and the current Python environment: - mlflow (current: 2.7.1, required: mlflow==2.11.2) - torch (current: 2.0.1+cu118, required: torch==2.2.1) - transformers (current: 4.31.0, required: transformers==4.38.2) To fix the mismatches, call `mlflow.pyfunc.get_model_dependencies(model_uri)` to fetch the model's environment and install dependencies using the resulting environment file.

WARNING mlflow.pyfunc: Calling `spark_udf()` with `env_manager="local"` does not recreate the same environment that was used during training, which may lead to errors or inaccurate predictions. We recommend specifying `env_manager="conda"`, which automatically recreates the environment that was used to train the model and performs inference in the recreated environment.


Community Manager
Community Manager

Hi @SangeethagkIt looks like you’re encountering a couple of issues related to mlflow.pyfunc.spark_udf() and model dependencies.

  1. TypeError: ColSpec.init() got an unexpected keyword argument ‘required’:

    • This error occurs when you’re using mlflow.pyfunc.spark_udf() with an unexpected argument.
    • The issue might be related to the way you’re specifying the input columns for the UDF.
    • To resolve this, consider checking the input arguments and ensure they match the expected format.
  2. Model Dependencies Mismatch:

    • The warning about model dependencies indicates that the current Python environment doesn’t match the environment in which the model was trained.
    • To fix this, you can use mlflow.pyfunc.get_model_dependencies(model_uri) to fetch the model’s environment and install the required dependencies using the resulting environment file.
    • Make sure your mlflow version matches the required version (2.11.2) and other dependencies are also aligned.
  3. Environment Manager for spark_udf():

    • The second warning suggests that using env_manager="local" with spark_udf() doesn’t recreate the same environment used during training.
    • To avoid errors or inaccurate predictions, consider specifying env_manager="conda". This will automatically recreate the training environment for inference.

Remember to address these points, and your issue should be resolved. If you need further assistance, feel free to ask! 😊

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!