stbjelcevic
Databricks Employee
Databricks Employee

Hi @yashshingvi ,

Thanks for the details—this is a common gotcha with MLflow “models from code.”

Why your imports fail

  • code_paths are only added to sys.path when the model is loaded (for inference/serving), not while the driver is executing mlflow.langchain.log_model(...) to log the model.
  • In the code-based logging flow, MLflow runs your lc_model file (deploy_chain.py) during logging; any imports inside that file must already be importable in the current notebook/cluster environment at logging time.

Make Python modules importable at logging time

Pick one of these:

  • Add the directory containing your helper modules (example_docs.py) to sys.path before calling log_model:

    import sys
    sys.path.append("/Workspace/Users/<user>/exp")  # folder that contains example_docs.py
    
    import mlflow
    
    with mlflow.start_run(run_name="run1"):
        logged_chain_info = mlflow.langchain.log_model(
            lc_model="/Workspace/Users/<user>/exp/deploy_chain.py",
            model_config="/Workspace/Users/<user>/exp/chain_config.yaml",
            artifact_path="exp_1_artifact",
            input_example=input_example,
            example_no_conversion=True,
            # include the whole directory so it’s available at load/serve time too
            code_paths=["/Workspace/Users/<user>/exp"],
        )

    This ensures import example_docs inside deploy_chain.py resolves during the logging step, and the directory is also packaged for serving.

  • Preferably, use a Databricks Repo and install your package in the logging environment:

    # One-time per cluster, or in your notebook before log_model
    %pip install -e /Workspace/Repos/<your_repo>/  # has pyproject.toml/setup.py
    
    # then log as usual

    This is the most reliable way to satisfy imports both at logging and serving time, as MLflow will capture and restore package dependencies.

Sources: