Re: Databricks Connect - Will I ever have to Stop ...

emma_s · a week ago

Hi there,

I

Short answer

You should call spark.stop() when you're done with each session. What you're doing now (not calling it) works, but it's not ideal — you're relying on the server-side idle timeout to clean up after you, and in the meantime each orphaned session consumes memory on the cluster for its SQLConf and SessionState. On a busy API with many requests, that can accumulate until the cluster eventually reclaims them.

Why the docs say "don't call stop"

The documentation warning about not calling stop() is aimed at a different scenario — specifically, when you're running inside a Databricks notebook or workspace environment where the session lifecycle is managed for you. In that context, calling stop() can tear down shared infrastructure you didn't create. It doesn't apply to your situation, where you're an external client creating sessions explicitly via Databricks Connect from an AKS pod. (Databricks Connect in notebooks)

What about automatic cleanup?

You may have read that Databricks Connect handles session cleanup automatically — and that's partially true. There are two mechanisms:

Process exit / shutdown hooks: PySpark registers an atexit handler that calls stop() on active sessions when the Python process terminates. If you were running a short-lived script (start, do work, exit), this would clean things up for you automatically. However, your API is a long-lived server process — it doesn't exit between requests. The shutdown hook only fires when the pod itself restarts or scales down, not after each request completes.
Server-side idle timeout: The Spark Connect server passively cleans up idle sessions after a period of inactivity. The release notes confirm: "Databricks Connect now automatically closes expired sessions on the client side." So sessions do eventually get reclaimed — but in the meantime they're sitting there consuming driver memory. (Databricks Connect release notes)

For a long-running API server creating a new session per request, neither mechanism gives you prompt cleanup. You'd accumulate sessions until the timeout kicks in.

What `spark.stop()` actually does in Databricks Connect

Since version 14.2.0, calling stop() on a Databricks Connect session sends a ReleaseSession RPC to the server, which:

Interrupts any running operations tied to that session
Releases server-side resources (memory, cached state)
Closes the gRPC channel on the client side

Since version 15.1.0, stop() is also idempotent — calling it on an already-closed or expired session won't throw an error. So it's safe to call in a finally block without worrying about race conditions with the idle timeout. (Databricks Connect release notes)

Recommended pattern

from databricks.connect import DatabricksSession

def handle_request():
    spark = DatabricksSession.builder.create()  # new session per request
    try:
        # your Spark work here
        result = spark.sql("SELECT ...")
        return result.collect()
    finally:
        spark.stop()  # clean up immediately

A few notes:

Use .create() rather than .getOrCreate() — you've already figured this out. The create() API was introduced in 16.0 specifically for this use case (always creates a fresh session rather than returning an existing one). (Databricks Connect release notes)
Wrap stop() in a finally block so it runs even if your Spark work throws an exception.
If you're on 18.1.1 as you mentioned, you have all the idempotent-stop and transient-retry improvements, so this is straightforward.

What happens if you don't call stop

It's not catastrophic — the server will eventually clean up idle sessions via the timeout. But you'll accumulate orphaned sessions in the interim, each holding memory on the driver. Under sustained load this can contribute to driver memory pressure.

That said, if your request volume is modest and you're not seeing issues, the idle timeout is probably handling things adequately. It's more of a "doing it properly" thing than a "this will definitely break" thing.

Relevant docs

Databricks Connect overview — general setup and architecture
Databricks Connect release notes — where the create() API, idempotent stop(), and session expiry handling are documented
Compute configuration for Databricks Connect — session builder configuration options
Databricks Connect in notebooks (workspace behaviour) — explains when stop() should not be called (i.e. not your case)
Spark Connect vs Spark Classic — best practices for the Spark Connect protocol your client uses

View solution in original post