cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks connect and spark session best practice

thibault
Contributor III

Hi all!

I am using databricks-connect to develop and test pyspark code pure python (not notebook) files in my local IDE, running on a Databricks cluster. These files are part of a deployment setup with dbx so that they are run as tasks in a workflow.

Everything works fine, but then there is this piece, to know whether to create a databricks connect spark session or reuse the spark session running in Databricks as part of a job :

try:
    from databricks.connect import DatabricksSession
    spark = DatabricksSession.builder.getOrCreate()
except ImportError:
    from pyspark.sql import SparkSession
    spark = SparkSession.builder.getOrCreate()

And that feels like code smell. Is there a nicer way you would recommend to handle the spark session, whether running locally via databricks-connect or directly on Databricks?

1 REPLY 1

thibault
Contributor III

Thanks for your response @Retired_mod . Can you elaborate on the difference between your suggestion and the code I provided? i.e. what would your if-else look like?

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now