cancel
Showing results for 
Search instead for 
Did you mean: 
Machine Learning
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithms, model training, deployment, and more. Connect with ML enthusiasts and experts.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to start an mlflow server with a postgres backend already filled with metadata of many experiments?

naveen_marthala
Contributor

I am experimenting with mlflow in docker containers.

I have postgres running on docker. and when I had used an empty database while starting mlflow server, everything worked as expected;

2022/05/01 13:57:45 INFO mlflow.store.db.utils: Creating initial MLflow database tables...
2022/05/01 13:57:45 INFO mlflow.store.db.utils: Updating database tables
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.
INFO  [alembic.runtime.migration] Running upgrade  -> 451aebb31d03, add metric step
INFO  [alembic.runtime.migration] Running upgrade 451aebb31d03 -> 90e64c465722, migrate user column to tags
INFO  [alembic.runtime.migration] Running upgrade 90e64c465722 -> 181f10493468, allow nulls for metric values
INFO  [alembic.runtime.migration] Running upgrade 181f10493468 -> df50e92ffc5e, Add Experiment Tags Table
INFO  [alembic.runtime.migration] Running upgrade df50e92ffc5e -> 7ac759974ad8, Update run tags with larger limit
INFO  [alembic.runtime.migration] Running upgrade 7ac759974ad8 -> 89d4b8295536, create latest metrics table
INFO  [89d4b8295536_create_latest_metrics_table_py] Migration complete!
INFO  [alembic.runtime.migration] Running upgrade 89d4b8295536 -> 2b4d017a5e9b, add model registry tables to db
INFO  [2b4d017a5e9b_add_model_registry_tables_to_db_py] Adding registered_models and model_versions tables to database.
INFO  [2b4d017a5e9b_add_model_registry_tables_to_db_py] Migration complete!
INFO  [alembic.runtime.migration] Running upgrade 2b4d017a5e9b -> cfd24bdc0731, Update run status constraint with killed
INFO  [alembic.runtime.migration] Running upgrade cfd24bdc0731 -> 0a8213491aaa, drop_duplicate_killed_constraint
INFO  [alembic.runtime.migration] Running upgrade 0a8213491aaa -> 728d730b5ebd, add registered model tags table
INFO  [alembic.runtime.migration] Running upgrade 728d730b5ebd -> 27a6a02d2cf1, add model version tags table
INFO  [alembic.runtime.migration] Running upgrade 27a6a02d2cf1 -> 84291f40a231, add run_link to model_version
INFO  [alembic.runtime.migration] Running upgrade 84291f40a231 -> a8c4a736bde6, allow nulls for run_id
INFO  [alembic.runtime.migration] Running upgrade a8c4a736bde6 -> 39d1c3be5f05, add_is_nan_constraint_for_metrics_tables_if_necessary
INFO  [alembic.runtime.migration] Running upgrade 39d1c3be5f05 -> c48cb773bb87, reset_default_value_for_is_nan_in_metrics_table_for_mysql
INFO  [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO  [alembic.runtime.migration] Will assume transactional DDL.

but when I started a new container running mlflow server and used the very same database, I get migration errors. full traceback:

2022/05/01 16:43:28 ERROR mlflow.cli: Error initializing backend store
2022/05/01 16:43:28 ERROR mlflow.cli: Detected out-of-date database schema (found version bd07f7e963c5, but expected c48cb773bb87). Take a backup of your database, then run 'mlflow db upgrade <database_uri>' to migrate your database to the latest schema. NOTE: schema migration may result in database downtime - please consult your database's documentation for more detail.
Traceback (most recent call last):
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/cli.py", line 411, in server
    initialize_backend_stores(backend_store_uri, default_artifact_root)
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/server/handlers.py", line 258, in initialize_backend_stores
    _get_tracking_store(backend_store_uri, default_artifact_root)
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/server/handlers.py", line 243, in _get_tracking_store
    _tracking_store = _tracking_store_registry.get_store(store_uri, artifact_root)
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/registry.py", line 39, in get_store
    return self._get_store_with_resolved_uri(resolved_store_uri, artifact_uri)
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/tracking/_tracking_service/registry.py", line 49, in _get_store_with_resolved_uri
    return builder(store_uri=resolved_store_uri, artifact_uri=artifact_uri)
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/server/handlers.py", line 111, in _get_sqlalchemy_store
    return SqlAlchemyStore(store_uri, artifact_uri)
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/store/tracking/sqlalchemy_store.py", line 141, in __init__
    mlflow.store.db.utils._verify_schema(self.engine)
  File "/home/naveend/.local/lib/python3.9/site-packages/mlflow/store/db/utils.py", line 53, in _verify_schema
    raise MlflowException(
mlflow.exceptions.MlflowException: Detected out-of-date database schema (found version bd07f7e963c5, but expected c48cb773bb87). Take a backup of your database, then run 'mlflow db upgrade <database_uri>' to migrate your database to the latest schema. NOTE: schema migration may result in database downtime - please consult your database's documentation for more detail.

In future, I plan to shift this entire set up to AWS ECS (with FARGATE) with postgres in AWS RDS. so, when containers with mlflow server restarts, it wouldn't be possible for me to empty the database or to migrate schema by running `mlflow db upgrade <database_uri>`, since I will be on serverless containers. How do I get around this and start mlflow servers as containers keep restarting and continue to use the same postgres?

and this server is running on "--serve-artifacts" mode.

1 ACCEPTED SOLUTION

Accepted Solutions

naveen_marthala
Contributor

Here is the fix I have found and has been working without any flaws at my side.

postgres server must be started first and only after it is up should the mlflow server be started. I was starting mlflow server while my postgres server firing up.

View solution in original post

6 REPLIES 6

Prabakar
Databricks Employee
Databricks Employee

Hi @Naveen Marthala​ I looked for the code and found the github link.

Hello @Prabakar Ammeappin​ , I would like to learn how to fix that and launch mlflow server with pre-existing schema. why would the server need to generate new schema everytime anyway. And I guess source code is not going to of much help for me. Or am I missing something.

no thanks. i have fixed it.

no thanks. i have fixed it.

naveen_marthala
Contributor

Here is the fix I have found and has been working without any flaws at my side.

postgres server must be started first and only after it is up should the mlflow server be started. I was starting mlflow server while my postgres server firing up.

Anonymous
Not applicable

A step-by-step guide to setup MLflow with a Postgres DB for storing metadata and a systemd unit to keep it running.

  1. Setup MLflow in Production (you are here!)
  2. MLflow: Basic logging functions.
  3. MLflow logging for TensorFlow.
  4. MLflow Projects.
  5. Retrieving the best model using Python API for MLflow.
  6. Serving a model using MLflow.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group