cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks connection pool(ing) using python

prateekgoyal
New Contributor II

I have a python application which I need to connect with databricks (Databricks SQL Connector). The process for that is quite simple -

1. connect with dbr
2. execute your query
3. get the result
4. close the connection

Example from - https://docs.databricks.com/en/dev-tools/python-sql-connector.html#manage-cursors-and-connections

 

from databricks import sql
import os

connection = sql.connect(server_hostname = os.getenv("DATABRICKS_SERVER_HOSTNAME"),
http_path = os.getenv("DATABRICKS_HTTP_PATH"),
access_token = os.getenv("DATABRICKS_TOKEN"))

cursor = connection.cursor()

cursor.execute("SELECT * from range(10)")
print(cursor.fetchall())

cursor.close()
connection.close()

 

But the issue here is that connecting to dbr takes almost 15-16 seconds in best case scenarios which is too much for an API I am building.

So to overcome, I implemented a custom "connection pooling" technique -

 

from databricks import sql
from app.utils.constants import DATABRICKS_SERVER_HOSTNAME, DATABRICKS_HTTP_PATH, DATABRICKS_TOKEN
import threading
import logging

logging.basicConfig(level=logging.INFO)

class ConnectionPool:
def __init__(self, max_size=5):
self.pool = []
self.max_size = max_size
self.lock = threading.Lock()

def create_connection(self):
logging.info("Creating a new connection")
return sql.connect(server_hostname=DATABRICKS_SERVER_HOSTNAME,
http_path=DATABRICKS_HTTP_PATH,
access_token=DATABRICKS_TOKEN)

def get_connection(self):
with self.lock:
if self.pool:
logging.info(f"Reusing connection from pool. Pool size before: {len(self.pool)}")
return self.pool.pop()
else:
logging.info("No available connections in pool, creating a new one")
return self.create_connection()

def release_connection(self, connection):
with self.lock:
if len(self.pool) < self.max_size:
self.pool.append(connection)
logging.info(f"Connection released back to pool. Pool size now: {len(self.pool)}")
else:
connection.close()
logging.info("Pool is full. Closing the connection")

connection_pool = ConnectionPool(max_size=5)

 

I know there are various factors which I have to factor as part of "best practices" like Manage Connection Lifetimes (Connection Timeout, Max Connection Lifetime), Connection Validation and Retry Logic - which I believe can be a topic of future discussions.

Now my main point currently is, the technique I have used is good enough for a basic production PoC or connection pooling can deteriorate performance (and be a mistake in long-run) or is there any out-of-the box solution for something like this (for ex - in rails active-record also has similar concept but built-in) ?

Any thoughts/suggestions/reviews are welcome 🙂

1 REPLY 1

prateekgoyal
New Contributor II

Hi @Retired_mod, thanks for your reply. I have 2 follow-up questions for you - 

  1. When you say "consider managing connection lifetimes by closing and recreating them periodically" - the problem I face with this is when my API request comes in and I create and close a connection in its synchronous lifecycle, the connection times is almost 15-16 seconds which adds this much to API's overall response time, which is not very suitable for our application. Am I understanding correctly what you want to say ?
  2. You suggested to use "third-party libraries like SQLAlchemy and DBUtils" - You mean don't connect to databricks directly but through these libraries to take advantage of connection pooling?

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group