Re: Using Datbricks Connect with serverless comput...

DaPo · ‎11-04-2024

Hi all,

I have been using databricks-connect with serverless compute to develop and debug my databricks related code. It worked great so far. Now I started integrating ML-Flow in my workflow, and I am encountering an issue. When I run the following code, I get an exception out of the spark runtime.

import mlflow
import databricks.connect as db_connect

mlflow.login(). # This prints an INFO-log: Login successfull!
# mlflow.set_model_uri("databricks)
spark_ctx = db_connect.DatbricksSession.builder.serverless(True).getOrCreate()
train_and_log_ml_model(spark_ctx)

The error message is the following:

pyspark.errors.exceptions.connect.AnalysisException: [CONFIG_NOT_AVAILABLE] Configuration spark.mlflow.modelRegistryUri is not available. SQLSTATE: 42K0I

What am I missing? I there a way, to make it work?

Greetings, Daniel

P.S.: My environment is quite bare-bones: A new python-venv, where I pip installed `databricks-connect==15.1` and `mlflow`. I have configured the databricks-cli to use SSO, with a DEFAULT profile in the file `~/.databrickscfg`.

Walter_C · ‎11-04-2024

The error you are encountering, pyspark.errors.exceptions.connect.AnalysisException: [CONFIG_NOT_AVAILABLE] Configuration spark.mlflow.modelRegistryUri is not available. SQLSTATE: 42K0I, is a known issue when using MLflow with serverless clusters in Databricks. This issue arises because the configuration spark.mlflow.modelRegistryUri is not set by default in serverless environments.

To resolve this issue, you can use a workaround that involves setting the registry URI manually. Here is a modified version of your code that includes this workaround:

import mlflow
import databricks.connect as db_connect
import mlflow.tracking._model_registry.utils

# Workaround to set the registry URI manually
mlflow.tracking._model_registry.utils._get_registry_uri_from_spark_session = lambda: "databricks-uc"

mlflow.login() # This prints an INFO-log: Login successful!
# mlflow.set_model_uri("databricks")
spark_ctx = db_connect.DatbricksSession.builder.serverless(True).getOrCreate()
train_and_log_ml_model(spark_ctx)

View solution in original post

irfanghat · ‎05-05-2025

Is this a problem with the serverless compute resource, I tried using the approach you recommended but still run into some issues even though I successfully connect to the tracking server.

Using Datbricks Connect with serverless compute and MLflow