Unable to add dependencies to mlflow.langchain.log...

yashshingvi · ‎08-07-2024

Hello,

I'm doing this-

with mlflow.start_run(run_name="run1"):
  logged_chain_info = mlflow.langchain.log_model(
        #   lc_model=os.path.join(os.getcwd(), 'full_chain'), this doesnt work too  
          lc_model='/Workspace/Users/{user_name}/exp/deploy_chain.py',
          model_config="/Workspace/Users/{user_name}/exp/chain_config.yaml", 
          artifact_path="exp_1_artifact",
    
          input_example=input_example,
          example_no_conversion=True,
          code_paths=["/Workspace/Users/{user_name}/exp/example_docs.py"]

      )

But when I do import example_docs in deploy_chain.py it says module not found when I run the above code.

Similarly,

If I try to add a pdf/image file in code_paths and try to access it using relative path it fails on the mlflow run step, If I give absolute path, it fails while serving the endpoint (file not found).

How should I add dependent files to this?

stbjelcevic · ‎11-10-2025

Hi @yashshingvi ,

Thanks for the details—this is a common gotcha with MLflow “models from code.”

Why your imports fail

code_paths are only added to sys.path when the model is loaded (for inference/serving), not while the driver is executing mlflow.langchain.log_model(...) to log the model.
In the code-based logging flow, MLflow runs your lc_model file (deploy_chain.py) during logging; any imports inside that file must already be importable in the current notebook/cluster environment at logging time.

Make Python modules importable at logging time

Pick one of these:

Add the directory containing your helper modules (example_docs.py) to sys.path before calling log_model:

import sys
sys.path.append("/Workspace/Users/<user>/exp")  # folder that contains example_docs.py

import mlflow

with mlflow.start_run(run_name="run1"):
    logged_chain_info = mlflow.langchain.log_model(
        lc_model="/Workspace/Users/<user>/exp/deploy_chain.py",
        model_config="/Workspace/Users/<user>/exp/chain_config.yaml",
        artifact_path="exp_1_artifact",
        input_example=input_example,
        example_no_conversion=True,
        # include the whole directory so it’s available at load/serve time too
        code_paths=["/Workspace/Users/<user>/exp"],
    )

This ensures import example_docs inside deploy_chain.py resolves during the logging step, and the directory is also packaged for serving.

Preferably, use a Databricks Repo and install your package in the logging environment:
```
# One-time per cluster, or in your notebook before log_model
%pip install -e /Workspace/Repos/<your_repo>/  # has pyproject.toml/setup.py

# then log as usual
```
This is the most reliable way to satisfy imports both at logging and serving time, as MLflow will capture and restore package dependencies.

Sources:

Unable to add dependencies to mlflow.langchain.log_model

Why your imports fail

Make Python modules importable at logging time