cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks connect and spark session best practice

thibault
Contributor

Hi all!

I am using databricks-connect to develop and test pyspark code pure python (not notebook) files in my local IDE, running on a Databricks cluster. These files are part of a deployment setup with dbx so that they are run as tasks in a workflow.

Everything works fine, but then there is this piece, to know whether to create a databricks connect spark session or reuse the spark session running in Databricks as part of a job :

try:
    from databricks.connect import DatabricksSession
    spark = DatabricksSession.builder.getOrCreate()
except ImportError:
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()

And that feels like code smell. Is there a nicer way you would recommend to handle the spark session, whether running locally via databricks-connect or directly on Databricks?

3 REPLIES 3

Kaniz_Fatma
Community Manager
Community Manager

Hi @thibaultThe code provided determines whether to create a Databricks Connect Spark session or reuse the Spark session running in Databricks as part of a job.

However, it can be improved to handle the Spark session more nicely. Instead of using a try-except block to import the DatabricksSession and falling back to the SparkSession, you can use a conditional statement to check if DatabricksSession is available.

If available, create a Databricks Connect Spark session; otherwise, create a regular Spark session.

Kaniz_Fatma
Community Manager
Community Manager

Hi @thibaultThe code provided determines whether to create a Databricks Connect Spark session or reuse the Spark session running in Databricks as part of a job.

However, it can be improved to handle the Spark session more nicely. Instead of using a try-except block to import the DatabricksSession and falling back to the SparkSession, you can use a conditional statement to check if DatabricksSession is available.

If available, create a Databricks Connect Spark session; otherwise, create a regular Spark session.

thibault
Contributor

Thanks for your response @Kaniz_Fatma . Can you elaborate on the difference between your suggestion and the code I provided? i.e. what would your if-else look like?

Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!