Databricks Community

naveen_marthala · ‎05-01-2022

I have an mlflow server with `--serve-artifacts` and with postgres as `--backend-store-uri`. The machine(container with base image python:3.9-bullseye) running the server has git installed which is available on path.

I am logging from jupyter-notebooks and these are on containers too(with base image python:3.9-slim-bullseye) and doesn't have git installed.

When I try to auto-log from client like this:

mlflow.sklearn.autolog()
 
# prepare training data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
 
# train a model
model = LinearRegression()
model.fit(X, y)
run_id = mlflow.last_active_run().info.run_id
print("Logged data and model in run {}".format(run_id))

I get warning that git is not installed and some more warnings and errors:

2022/05/01 14:21:41 WARNING mlflow.tracking.context.git_context: Failed to import Git (the Git executable is probably not on your PATH), so Git SHA is not available. Error: Failed to initialize: Bad git executable.
The git executable must be specified in one of the following ways:
    - be included in your $PATH
    - be set via $GIT_PYTHON_GIT_EXECUTABLE
    - explicitly set via git.refresh()
 
All git commands will error until this is rectified.
 
This initial warning can be silenced or aggravated in the future by setting the
$GIT_PYTHON_REFRESH environment variable. Use one of the following values:
    - quiet|q|silence|s|none|n|0: for no warning or exception
    - warn|w|warning|1: for a printed warning
    - error|e|raise|r|2: for a raised exception
 
Example:
    export GIT_PYTHON_REFRESH=quiet
 
2022/05/01 14:21:41 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'e914209e05d449e6af817d0d692b1012', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
2022/05/01 14:22:45 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: API request to http://host.docker.internal:5000/api/2.0/mlflow-artifacts/artifacts/1/e914209e05d449e6af817d0d692b10... failed with exception HTTPConnectionPool(host='host.docker.internal', port=5000): Max retries exceeded with url: /api/2.0/mlflow-artifacts/artifacts/1/e914209e05d449e6af817d0d692b1012/artifacts/model/model.pkl (Caused by ResponseError('too many 500 error responses'))
Logged data and model in run e914209e05d449e6af817d0d692b1012

I couldn't figure out why clients need to have git installed and have been under the assumption that clients must only be able to send HTTP requests to server and doesn't need to have anything else installed? what am I missing and how can i avoid that warning, not by not seeing it, but actually fix what's causing it?