Databricks Community

DaPo · ‎11-04-2024

Hi all,

I have been using databricks-connect with serverless compute to develop and debug my databricks related code. It worked great so far. Now I started integrating ML-Flow in my workflow, and I am encountering an issue. When I run the following code, I get an exception out of the spark runtime.

import mlflow
import databricks.connect as db_connect

mlflow.login(). # This prints an INFO-log: Login successfull!
# mlflow.set_model_uri("databricks)
spark_ctx = db_connect.DatbricksSession.builder.serverless(True).getOrCreate()
train_and_log_ml_model(spark_ctx)

The error message is the following:

pyspark.errors.exceptions.connect.AnalysisException: [CONFIG_NOT_AVAILABLE] Configuration spark.mlflow.modelRegistryUri is not available. SQLSTATE: 42K0I

What am I missing? I there a way, to make it work?

Greetings, Daniel

P.S.: My environment is quite bare-bones: A new python-venv, where I pip installed `databricks-connect==15.1` and `mlflow`. I have configured the databricks-cli to use SSO, with a DEFAULT profile in the file `~/.databrickscfg`.

Walter_C · ‎11-04-2024

The error you are encountering, pyspark.errors.exceptions.connect.AnalysisException: [CONFIG_NOT_AVAILABLE] Configuration spark.mlflow.modelRegistryUri is not available. SQLSTATE: 42K0I, is a known issue when using MLflow with serverless clusters in Databricks. This issue arises because the configuration spark.mlflow.modelRegistryUri is not set by default in serverless environments.

To resolve this issue, you can use a workaround that involves setting the registry URI manually. Here is a modified version of your code that includes this workaround:

import mlflow
import databricks.connect as db_connect
import mlflow.tracking._model_registry.utils

# Workaround to set the registry URI manually
mlflow.tracking._model_registry.utils._get_registry_uri_from_spark_session = lambda: "databricks-uc"

mlflow.login() # This prints an INFO-log: Login successful!
# mlflow.set_model_uri("databricks")
spark_ctx = db_connect.DatbricksSession.builder.serverless(True).getOrCreate()
train_and_log_ml_model(spark_ctx)

View solution in original post

Walter_C · ‎11-04-2024

The error you are encountering, pyspark.errors.exceptions.connect.AnalysisException: [CONFIG_NOT_AVAILABLE] Configuration spark.mlflow.modelRegistryUri is not available. SQLSTATE: 42K0I, is a known issue when using MLflow with serverless clusters in Databricks. This issue arises because the configuration spark.mlflow.modelRegistryUri is not set by default in serverless environments.

To resolve this issue, you can use a workaround that involves setting the registry URI manually. Here is a modified version of your code that includes this workaround:

import mlflow
import databricks.connect as db_connect
import mlflow.tracking._model_registry.utils

# Workaround to set the registry URI manually
mlflow.tracking._model_registry.utils._get_registry_uri_from_spark_session = lambda: "databricks-uc"

mlflow.login() # This prints an INFO-log: Login successful!
# mlflow.set_model_uri("databricks")
spark_ctx = db_connect.DatbricksSession.builder.serverless(True).getOrCreate()
train_and_log_ml_model(spark_ctx)

Databricks Community

Using Datbricks Connect with serverless compute and MLflow

Photos

Connect with Databricks Users in Your Area

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Introducing SAP Databricks

Serverless Compute for Notebooks, Workflows and Pipelines is now Generally Available on Google Cloud

Welcoming BladeBridge to Databricks: Accelerating Data Warehouse Migrations to Lakehouse

Databricks Clean Rooms: Now Generally Available on AWS and Azure