cancel
Showing results for 
Search instead for 
Did you mean: 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Resolve ConnectTimeoutError When Registering Models with MLflow

otara_geni
New Contributor

Hello everyone,

I'm trying to register a model with MLflow in Databricks, but encountering an error with the following command:

 

model_version = mlflow.register_model(f"runs:/{run_id}/random_forest_model", model_name)

 

 

 

The error message is as follows:

 

ConnectTimeoutError: Connect timeout on endpoint URL: "https://s3.amazonaws.com/***(unitycatalog-s3bucket-name)?location"
File /databricks/python/lib/python3.10/site-packages/urllib3/connection.py:174, in HTTPConnection._new_conn(self)
    173 try:
--> 174     conn = connection.create_connection(
    175         (self._dns_host, self.port), self.timeout, **extra_kw
    176     )
    178 except SocketTimeout:
File /databricks/python/lib/python3.10/site-packages/botocore/httpsession.py:490, in URLLib3Session.send(self, request)
    486     raise ProxyConnectionError(
    487         proxy_url=mask_proxy_url(proxy_url), error=e
    488     )
    489 except URLLib3ConnectTimeoutError as e:
--> 490     raise ConnectTimeoutError(endpoint_url=request.url, error=e)
    491 except URLLib3ReadTimeoutError as e:
    492     raise ReadTimeoutError(endpoint_url=request.url, error=e)

 

This command is referenced from this site's notebook: Databricks Scikit-learn Notebook.

It's worth mentioning that there are no issues with the DeltaTable registration process.

What could be causing this error? And how can it be resolved?

Thank you in advance for your help.

2 REPLIES 2

Kaniz_Fatma
Community Manager
Community Manager

Hi @otara_geni, The ConnectTimeoutError you’re encountering when registering a model with MLflow in Databricks is related to a timeout issue while connecting to the specified endpoint URL.

  • The error message indicates that there’s a timeout when connecting to the URL “https://s3.amazonaws.com/***(unitycatalog-s3bucket-name)?location”.
  • Verify that your Databricks cluster has network connectivity to the specified S3 bucket. Ensure that there are no firewall rules or network restrictions blocking the connection.
  • You can test the connectivity using tools like nc (netcat) or curl. For example:
    %sh nc -zv <url> <port>
    %sh curl -vvv <URL>
    
  • If there are no issues on the networking side, proceed to the next steps.
  • As an immediate workaround, consider adding retries around the model download logic. You can add a couple of retries with a short sleep interval (e.g., 1 second) between retries.
  • This can help mitigate transient network issues causing the timeout.

Example:

import time
retries = 3
for _ in range(retries):
    try:
        model_version = mlflow.register_model(f"runs:/{run_id}/random_forest_model", model_name)
        break
    except ConnectTimeoutError:
        time.sleep(1)  # Wait for 1 second before retrying
  • Another possibility is that the connection pool used for managing reusable connections is full.
  • This can happen when your code makes multiple HTTP requests concurrently (such as downloading model files), and the pool reaches its maximum capacity.
  • Check if you’re making multiple concurrent requests and consider adjusting the connection pool size if needed.
  • The default pool size is 10, but you can increase it if necessary.
  • Ensure that your MLflow configuration (e.g., S3 endpoint, access keys, etc.) is correctly set up.
  • Double-check the bucket name and other relevant parameters.
  • Verify that the credentials have the necessary permissions to access the S3 bucket.
  • Refer to the Databricks documentation on MLflow for additional guidance.
  • Explore community discussions related to similar issues, such as this thread and this one.

Good luck! 🚀12345

 

some-rsa
New Contributor II
New Contributor II

@otara_geni if you are still struggling, try this - set the environmental variable in your code just before logging the model with the URL of the regional S3 endpoint (from the error it looks like MLFlow is attempting to use global one, which may not work in some cases, e.g. if you are using a backend private link). Just the domain name, not the full path.

Make sure to use your Databricks/bucket regional endpoint, e.g. `https://s3.eu-west-1.amazonaws.com` is for `eu-west-1`

import os
os.environ['MLFLOW_S3_ENDPOINT_URL'] = 'https://s3.<region>.amazonaws.com'

HTH, Niko

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group