cancel
Showing results for 
Search instead for 
Did you mean: 
Generative AI
Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to add dependencies to mlflow.langchain.log_model

yashshingvi
New Contributor

Hello,

I'm doing this-

with mlflow.start_run(run_name="run1"):
  logged_chain_info = mlflow.langchain.log_model(
        #   lc_model=os.path.join(os.getcwd(), 'full_chain'), this doesnt work too  
          lc_model='/Workspace/Users/{user_name}/exp/deploy_chain.py',
          model_config="/Workspace/Users/{user_name}/exp/chain_config.yaml", 
          artifact_path="exp_1_artifact",
    
          input_example=input_example,
          example_no_conversion=True,
          code_paths=["/Workspace/Users/{user_name}/exp/example_docs.py"]

      )

But when I do import example_docs in deploy_chain.py it says module not found when I run the above code.

Similarly,

If I try to add a pdf/image file in code_paths and try to access it using relative path it fails on the mlflow run step, If I give absolute path, it fails while serving the endpoint (file not found).

 

How should I add dependent files to this?

 

1 REPLY 1

stbjelcevic
Databricks Employee
Databricks Employee

Hi @yashshingvi ,

Thanks for the details—this is a common gotcha with MLflow “models from code.”

Why your imports fail

  • code_paths are only added to sys.path when the model is loaded (for inference/serving), not while the driver is executing mlflow.langchain.log_model(...) to log the model.
  • In the code-based logging flow, MLflow runs your lc_model file (deploy_chain.py) during logging; any imports inside that file must already be importable in the current notebook/cluster environment at logging time.

Make Python modules importable at logging time

Pick one of these:

  • Add the directory containing your helper modules (example_docs.py) to sys.path before calling log_model:

    import sys
    sys.path.append("/Workspace/Users/<user>/exp")  # folder that contains example_docs.py
    
    import mlflow
    
    with mlflow.start_run(run_name="run1"):
        logged_chain_info = mlflow.langchain.log_model(
            lc_model="/Workspace/Users/<user>/exp/deploy_chain.py",
            model_config="/Workspace/Users/<user>/exp/chain_config.yaml",
            artifact_path="exp_1_artifact",
            input_example=input_example,
            example_no_conversion=True,
            # include the whole directory so it’s available at load/serve time too
            code_paths=["/Workspace/Users/<user>/exp"],
        )

    This ensures import example_docs inside deploy_chain.py resolves during the logging step, and the directory is also packaged for serving.

  • Preferably, use a Databricks Repo and install your package in the logging environment:

    # One-time per cluster, or in your notebook before log_model
    %pip install -e /Workspace/Repos/<your_repo>/  # has pyproject.toml/setup.py
    
    # then log as usual

    This is the most reliable way to satisfy imports both at logging and serving time, as MLflow will capture and restore package dependencies.

Sources: