cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

why does the client need to have git installed for auto-logging to an mlflow server running in "--serve-artifacts" mode?

naveen_marthala
Contributor

I have an mlflow server with `--serve-artifacts` and with postgres as `--backend-store-uri`. The machine(container with base image python:3.9-bullseye) running the server has git installed which is available on path.

I am logging from jupyter-notebooks and these are on containers too(with base image python:3.9-slim-bullseye) and doesn't have git installed.

When I try to auto-log from client like this:

mlflow.sklearn.autolog()
 
# prepare training data
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
 
# train a model
model = LinearRegression()
model.fit(X, y)
run_id = mlflow.last_active_run().info.run_id
print("Logged data and model in run {}".format(run_id))

I get warning that git is not installed and some more warnings and errors:

2022/05/01 14:21:41 WARNING mlflow.tracking.context.git_context: Failed to import Git (the Git executable is probably not on your PATH), so Git SHA is not available. Error: Failed to initialize: Bad git executable.
The git executable must be specified in one of the following ways:
    - be included in your $PATH
    - be set via $GIT_PYTHON_GIT_EXECUTABLE
    - explicitly set via git.refresh()
 
All git commands will error until this is rectified.
 
This initial warning can be silenced or aggravated in the future by setting the
$GIT_PYTHON_REFRESH environment variable. Use one of the following values:
    - quiet|q|silence|s|none|n|0: for no warning or exception
    - warn|w|warning|1: for a printed warning
    - error|e|raise|r|2: for a raised exception
 
Example:
    export GIT_PYTHON_REFRESH=quiet
 
2022/05/01 14:21:41 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID 'e914209e05d449e6af817d0d692b1012', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current sklearn workflow
2022/05/01 14:22:45 WARNING mlflow.utils.autologging_utils: Encountered unexpected error during sklearn autologging: API request to http://host.docker.internal:5000/api/2.0/mlflow-artifacts/artifacts/1/e914209e05d449e6af817d0d692b10... failed with exception HTTPConnectionPool(host='host.docker.internal', port=5000): Max retries exceeded with url: /api/2.0/mlflow-artifacts/artifacts/1/e914209e05d449e6af817d0d692b1012/artifacts/model/model.pkl (Caused by ResponseError('too many 500 error responses'))
Logged data and model in run e914209e05d449e6af817d0d692b1012

I couldn't figure out why clients need to have git installed and have been under the assumption that clients must only be able to send HTTP requests to server and doesn't need to have anything else installed? what am I missing and how can i avoid that warning, not by not seeing it, but actually fix what's causing it?

2 REPLIES 2

Hubert-Dudek
Esteemed Contributor III

When it is part of the MLflow Project, it requires git.

@Hubert Dudek​ , I still haven't made anything a project, in the context of MlFlow. So, would I need MlFlow irrespective of what I am trying to do?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group