a week ago
I have API that triggers Spark calculations - with API hosted by Python 3.12 pod in AKS and connects to Databricks cluster using Databricks 18.1.1.
Initially I was using getOrCreate call on my API requests and all works.
But problem is - as Spark session is shared.. after a while when new API request comes in, it fails with "INVALID SESSION" - this is because Cluster expired the Spark session after waiting enough due to inactivity - looks like.
I also felt like sharing same Spark session is not my intention as I was to isolate each API call/request and create NEW Spark session per request.
So now I am using create NEW Spark session per API request and dont have any issues.
But do I need to do any clean up?
If I am creating NEW session , I feel tempted to call Stop session or do any clean up once I am done.
But documentation seems to suggest never explicitly call STOP Session.
I am not seeing any issues so far by not calling STOP but not sure if this causes any resource leaks and want to do what is right?
What is right way to clean up Spark Session in this case when its created explicitly per API request? Do nothing like what I am doing OR call STOP - please suggest
a week ago
Hi,
You can use Databricks SQL Connector for the activities instead of Databricks Connect if you require simple setup & easier management (session etc) with features below
More details here
Apache Spark generally requires you to explicitly declare that they are complete by using commands such as sys.exit() or sc.stop(). Databricks automatically terminates and cleans up jobs as they reach completion, so these commands are not necessary and should be removed. The automatic cleanup occurs when the request completes.
a week ago
Hi there,
I
spark.stop() when you're done with each session. What you're doing now (not calling it) works, but it's not ideal โ you're relying on the server-side idle timeout to clean up after you, and in the meantime each orphaned session consumes memory on the cluster for its SQLConf and SessionState. On a busy API with many requests, that can accumulate until the cluster eventually reclaims them.stop() is aimed at a different scenario โ specifically, when you're running inside a Databricks notebook or workspace environment where the session lifecycle is managed for you. In that context, calling stop() can tear down shared infrastructure you didn't create. It doesn't apply to your situation, where you're an external client creating sessions explicitly via Databricks Connect from an AKS pod. (Databricks Connect in notebooks)atexit handler that calls stop() on active sessions when the Python process terminates. If you were running a short-lived script (start, do work, exit), this would clean things up for you automatically. However, your API is a long-lived server process โ it doesn't exit between requests. The shutdown hook only fires when the pod itself restarts or scales down, not after each request completes.spark.stop() actually does in Databricks Connectstop() on a Databricks Connect session sends a ReleaseSession RPC to the server, which:stop() is also idempotent โ calling it on an already-closed or expired session won't throw an error. So it's safe to call in a finally block without worrying about race conditions with the idle timeout. (Databricks Connect release notes)
from databricks.connect import DatabricksSession
def handle_request():
spark = DatabricksSession.builder.create() # new session per request
try:
# your Spark work here
result = spark.sql("SELECT ...")
return result.collect()
finally:
spark.stop() # clean up immediately
.create() rather than .getOrCreate() โ you've already figured this out. The create() API was introduced in 16.0 specifically for this use case (always creates a fresh session rather than returning an existing one). (Databricks Connect release notes)stop() in a finally block so it runs even if your Spark work throws an exception.create() API, idempotent stop(), and session expiry handling are documentedstop() should not be called (i.e. not your case)a week ago
Hi,
You can use Databricks SQL Connector for the activities instead of Databricks Connect if you require simple setup & easier management (session etc) with features below
More details here
Apache Spark generally requires you to explicitly declare that they are complete by using commands such as sys.exit() or sc.stop(). Databricks automatically terminates and cleans up jobs as they reach completion, so these commands are not necessary and should be removed. The automatic cleanup occurs when the request completes.
a week ago
Hi there,
I
spark.stop() when you're done with each session. What you're doing now (not calling it) works, but it's not ideal โ you're relying on the server-side idle timeout to clean up after you, and in the meantime each orphaned session consumes memory on the cluster for its SQLConf and SessionState. On a busy API with many requests, that can accumulate until the cluster eventually reclaims them.stop() is aimed at a different scenario โ specifically, when you're running inside a Databricks notebook or workspace environment where the session lifecycle is managed for you. In that context, calling stop() can tear down shared infrastructure you didn't create. It doesn't apply to your situation, where you're an external client creating sessions explicitly via Databricks Connect from an AKS pod. (Databricks Connect in notebooks)atexit handler that calls stop() on active sessions when the Python process terminates. If you were running a short-lived script (start, do work, exit), this would clean things up for you automatically. However, your API is a long-lived server process โ it doesn't exit between requests. The shutdown hook only fires when the pod itself restarts or scales down, not after each request completes.spark.stop() actually does in Databricks Connectstop() on a Databricks Connect session sends a ReleaseSession RPC to the server, which:stop() is also idempotent โ calling it on an already-closed or expired session won't throw an error. So it's safe to call in a finally block without worrying about race conditions with the idle timeout. (Databricks Connect release notes)
from databricks.connect import DatabricksSession
def handle_request():
spark = DatabricksSession.builder.create() # new session per request
try:
# your Spark work here
result = spark.sql("SELECT ...")
return result.collect()
finally:
spark.stop() # clean up immediately
.create() rather than .getOrCreate() โ you've already figured this out. The create() API was introduced in 16.0 specifically for this use case (always creates a fresh session rather than returning an existing one). (Databricks Connect release notes)stop() in a finally block so it runs even if your Spark work throws an exception.create() API, idempotent stop(), and session expiry handling are documentedstop() should not be called (i.e. not your case)