Databricks connect and spark session best practice
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-07-2023 10:46 PM
Hi all!
I am using databricks-connect to develop and test pyspark code pure python (not notebook) files in my local IDE, running on a Databricks cluster. These files are part of a deployment setup with dbx so that they are run as tasks in a workflow.
Everything works fine, but then there is this piece, to know whether to create a databricks connect spark session or reuse the spark session running in Databricks as part of a job :
try:
from databricks.connect import DatabricksSession
spark = DatabricksSession.builder.getOrCreate()
except ImportError:
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
And that feels like code smell. Is there a nicer way you would recommend to handle the spark session, whether running locally via databricks-connect or directly on Databricks?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-13-2023 12:07 AM
Thanks for your response @Retired_mod . Can you elaborate on the difference between your suggestion and the code I provided? i.e. what would your if-else look like?

