05-01-2022 06:08 AM
I am experimenting with mlflow in docker containers.
I have postgres running on docker. and when I had used an empty database while starting mlflow server, everything worked as expected;
2022/05/01 13:57:45 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2022/05/01 13:57:45 INFO mlflow.store.db.utils: Updating database tables
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 451aebb31d03, add metric step
INFO [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
INFO [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
INFO [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
INFO [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
INFO [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
INFO [89d4b8295536_create_latest_metrics_table_py] Migration complete!
INFO [alembic.runtime.migration] Running upgrade 89d4b8295536 -> 2b4d017a5e9b, add model registry tables to db
INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Adding registered_models and model_versions tables to database.
INFO [2b4d017a5e9b_add_model_registry_tables_to_db_py] Migration complete!
INFO [alembic.runtime.migration] Running upgrade 2b4d017a5e9b -> cfd24bdc0731, Update run status constraint with killed
INFO [alembic.runtime.migration] Running upgrade cfd24bdc0731 -> 0a8213491aaa, drop_duplicate_killed_constraint
INFO [alembic.runtime.migration] Running upgrade 0a8213491aaa -> 728d730b5ebd, add registered model tags table
INFO [alembic.runtime.migration] Running upgrade 728d730b5ebd -> 27a6a02d2cf1, add model version tags table
INFO [alembic.runtime.migration] Running upgrade 27a6a02d2cf1 -> 84291f40a231, add run_link to model_version
INFO [alembic.runtime.migration] Running upgrade 84291f40a231 -> a8c4a736bde6, allow nulls for run_id
INFO [alembic.runtime.migration] Running upgrade a8c4a736bde6 -> 39d1c3be5f05, add_is_nan_constraint_for_metrics_tables_if_necessary
INFO [alembic.runtime.migration] Running upgrade 39d1c3be5f05 -> c48cb773bb87, reset_default_value_for_is_nan_in_metrics_table_for_mysql
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
but when I started a new container running mlflow server and used the very same database, I get migration errors. full traceback:
2022/05/01 16:43:28 ERROR mlflow.cli: Error initializing backend store
2022/05/01 16:43:28 ERROR mlflow.cli: Detected out-of-date database schema (found version bd07f7e963c5, but expected c48cb773bb87). Take a backup of your database, then run 'mlflow db upgrade <database_uri>' to migrate your database to the latest schema. NOTE: schema migration may result in database downtime - please consult your database's documentation for more detail.
Traceback (most recent call last):
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/cli.py", line 411, in server
initialize_backend_stores(backend_store_uri, default_artifact_root)
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/server/handlers.py", line 258, in initialize_backend_stores
_get_tracking_store(backend_store_uri, default_artifact_root)
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/server/handlers.py", line 243, in _get_tracking_store
_tracking_store = _tracking_store_registry.get_store(store_uri, artifact_root)
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/registry.py", line 39, in get_store
return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri)
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/registry.py", line 49, in _get_store_with_resolved_uri
return builder(store_uri=resolved_store_uri, artifact_uri=artifact_uri)
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/server/handlers.py", line 111, in _get_sqlalchemy_store
return SqlAlchemyStore(store_uri, artifact_uri)
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/store/tracking/sqlalchemy_store.py", line 141, in __init__
mlflow.store.db.utils._verify_schema(self.engine)
File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/store/db/utils.py", line 53, in _verify_schema
raise MlflowException(
mlflow.exceptions.MlflowException: Detected out-of-date database schema (found version bd07f7e963c5, but expected c48cb773bb87). Take a backup of your database, then run 'mlflow db upgrade <database_uri>' to migrate your database to the latest schema. NOTE: schema migration may result in database downtime - please consult your database's documentation for more detail.
In future, I plan to shift this entire set up to AWS ECS (with FARGATE) with postgres in AWS RDS. so, when containers with mlflow server restarts, it wouldn't be possible for me to empty the database or to migrate schema by running `mlflow db upgrade <database_uri>`, since I will be on serverless containers. How do I get around this and start mlflow servers as containers keep restarting and continue to use the same postgres?
and this server is running on "--serve-artifacts" mode.
06-15-2022 12:45 AM
Here is the fix I have found and has been working without any flaws at my side.
postgres server must be started first and only after it is up should the mlflow server be started. I was starting mlflow server while my postgres server firing up.
05-01-2022 07:48 AM
Hi @Naveen Marthala I looked for the code and found the github link.
05-01-2022 07:53 AM
Hello @Prabakar Ammeappin , I would like to learn how to fix that and launch mlflow server with pre-existing schema. why would the server need to generate new schema everytime anyway. And I guess source code is not going to of much help for me. Or am I missing something.
06-14-2022 11:50 PM
no thanks. i have fixed it.
06-14-2022 11:50 PM
no thanks. i have fixed it.
06-15-2022 12:45 AM
Here is the fix I have found and has been working without any flaws at my side.
postgres server must be started first and only after it is up should the mlflow server be started. I was starting mlflow server while my postgres server firing up.
06-15-2022 02:27 AM
A step-by-step guide to setup MLflow with a Postgres DB for storing metadata and a systemd unit to keep it running.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group