- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
I have API that triggers Spark calculations - with API hosted by Python 3.12 pod in AKS and connects to Databricks cluster using Databricks 18.1.1.
Initially I was using getOrCreate call on my API requests and all works.
But problem is - as Spark session is shared.. after a while when new API request comes in, it fails with "INVALID SESSION" - this is because Cluster expired the Spark session after waiting enough due to inactivity - looks like.
I also felt like sharing same Spark session is not my intention as I was to isolate each API call/request and create NEW Spark session per request.
So now I am using create NEW Spark session per API request and dont have any issues.
But do I need to do any clean up?
If I am creating NEW session , I feel tempted to call Stop session or do any clean up once I am done.
But documentation seems to suggest never explicitly call STOP Session.
I am not seeing any issues so far by not calling STOP but not sure if this causes any resource leaks and want to do what is right?
What is right way to clean up Spark Session in this case when its created explicitly per API request? Do nothing like what I am doing OR call STOP - please suggest
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Hi,
You can use Databricks SQL Connector for the activities instead of Databricks Connect if you require simple setup & easier management (session etc) with features below
- SQL queries (SELECT, INSERT, UPDATE)
- Executing parameterized SQL statements
- Fetching data and doing transformations in Python after retrieval
More details here
Apache Spark generally requires you to explicitly declare that they are complete by using commands such as sys.exit() or sc.stop(). Databricks automatically terminates and cleans up jobs as they reach completion, so these commands are not necessary and should be removed. The automatic cleanup occurs when the request completes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
a week ago
Hi there,
I
Short answer
spark.stop() when you're done with each session. What you're doing now (not calling it) works, but it's not ideal — you're relying on the server-side idle timeout to clean up after you, and in the meantime each orphaned session consumes memory on the cluster for its SQLConf and SessionState. On a busy API with many requests, that can accumulate until the cluster eventually reclaims them.Why the docs say "don't call stop"
stop() is aimed at a different scenario — specifically, when you're running inside a Databricks notebook or workspace environment where the session lifecycle is managed for you. In that context, calling stop() can tear down shared infrastructure you didn't create. It doesn't apply to your situation, where you're an external client creating sessions explicitly via Databricks Connect from an AKS pod. (Databricks Connect in notebooks)What about automatic cleanup?
-
Process exit / shutdown hooks: PySpark registers an
atexithandler that callsstop()on active sessions when the Python process terminates. If you were running a short-lived script (start, do work, exit), this would clean things up for you automatically. However, your API is a long-lived server process — it doesn't exit between requests. The shutdown hook only fires when the pod itself restarts or scales down, not after each request completes. -
Server-side idle timeout: The Spark Connect server passively cleans up idle sessions after a period of inactivity. The release notes confirm: "Databricks Connect now automatically closes expired sessions on the client side." So sessions do eventually get reclaimed — but in the meantime they're sitting there consuming driver memory. (Databricks Connect release notes)
What spark.stop() actually does in Databricks Connect
stop() on a Databricks Connect session sends a ReleaseSession RPC to the server, which:- Interrupts any running operations tied to that session
- Releases server-side resources (memory, cached state)
- Closes the gRPC channel on the client side
stop() is also idempotent — calling it on an already-closed or expired session won't throw an error. So it's safe to call in a finally block without worrying about race conditions with the idle timeout. (Databricks Connect release notes)Recommended pattern
from databricks.connect import DatabricksSession
def handle_request():
spark = DatabricksSession.builder.create() # new session per request
try:
# your Spark work here
result = spark.sql("SELECT ...")
return result.collect()
finally:
spark.stop() # clean up immediately
- Use
.create()rather than.getOrCreate()— you've already figured this out. Thecreate()API was introduced in 16.0 specifically for this use case (always creates a fresh session rather than returning an existing one). (Databricks Connect release notes) - Wrap
stop()in afinallyblock so it runs even if your Spark work throws an exception. - If you're on 18.1.1 as you mentioned, you have all the idempotent-stop and transient-retry improvements, so this is straightforward.
What happens if you don't call stop
Relevant docs
- Databricks Connect overview — general setup and architecture
- Databricks Connect release notes — where the
create()API, idempotentstop(), and session expiry handling are documented - Compute configuration for Databricks Connect — session builder configuration options
- Databricks Connect in notebooks (workspace behaviour) — explains when
stop()should not be called (i.e. not your case) - Spark Connect vs Spark Classic — best practices for the Spark Connect protocol your client uses