cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks connect and spark session best practice

thibault
Contributor

Hi all!

I am using databricks-connect to develop and test pyspark code pure python (not notebook) files in my local IDE, running on a Databricks cluster. These files are part of a deployment setup with dbx so that they are run as tasks in a workflow.

Everything works fine, but then there is this piece, to know whether to create a databricks connect spark session or reuse the spark session running in Databricks as part of a job :

try:
    from databricks.connect import DatabricksSession
    spark = DatabricksSession.builder.getOrCreate()
except ImportError:
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()

And that feels like code smell. Is there a nicer way you would recommend to handle the spark session, whether running locally via databricks-connect or directly on Databricks?

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @thibaultThe code provided determines whether to create a Databricks Connect Spark session or reuse the Spark session running in Databricks as part of a job.

However, it can be improved to handle the Spark session more nicely. Instead of using a try-except block to import the DatabricksSession and falling back to the SparkSession, you can use a conditional statement to check if DatabricksSession is available.

If available, create a Databricks Connect Spark session; otherwise, create a regular Spark session.

Kaniz
Community Manager
Community Manager

Hi @thibaultThe code provided determines whether to create a Databricks Connect Spark session or reuse the Spark session running in Databricks as part of a job.

However, it can be improved to handle the Spark session more nicely. Instead of using a try-except block to import the DatabricksSession and falling back to the SparkSession, you can use a conditional statement to check if DatabricksSession is available.

If available, create a Databricks Connect Spark session; otherwise, create a regular Spark session.

thibault
Contributor

Thanks for your response @Kaniz . Can you elaborate on the difference between your suggestion and the code I provided? i.e. what would your if-else look like?