cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How can I start SparkSession out of Notebook?

NCat
New Contributor III

Hi community,

How can I start SparkSession out of Notebook?
I want to split my Notebook into small Python modules, and I want to let some of them to call Spark functionality.

5 REPLIES 5

-werners-
Esteemed Contributor III

can you elaborate a bit more?
Are you going to call those modules in a notebook, and want to use spark functions in them?
Or do you want to explicitly start a separate sparksession for each module?

sakhulaz
New Contributor II

Hello,

To start a SparkSession outside of a notebook, you can follow these steps to split your code into small Python modules and utilize Spark functionality:

  1. Import Required Libraries: In your Python module, import the necessary libraries for Spark:

In your Python module, import the necessary libraries for Spark:

from pyspark.sql import SparkSession
  1. Create SparkSession:

Initialize the SparkSession at the beginning of your module:

spark = SparkSession.builder \
    .appName("YourAppName") \
    .config("spark.some.config.option", "config-value") \
    .getOrCreate()

Customize the configuration options as needed.

Tharun-Kumar
Honored Contributor II
Honored Contributor II

@NCat 

Databricks provides Spark Session out of the box. You have to just use the variable "spark". 

Screenshot 2023-08-09 at 5.52.07 PM.png

In order to use it in other modules, you have to pass the spark variable as a parameter to the other modules.

NCat
New Contributor III

Thank you for all replies.
@-werners- I want to use Spark Session in modules which is called from Notebook.

@sakhulaz How can I get the config options to attach to the Databricks data?

@Tharun-Kumar Thank you. That approach definitely works for my situation!

-werners-
Esteemed Contributor III

in general (as already stated) a notebook automatically gets a sparksession.
You don't have to do anything.
If you specifically need to have separate sessions (isolation), you should run different notebooks (or plan different jobs) as these get a new session (a session per notebook/job).
Calling magic functions like %scala, %run etc use the same sparksession, so no isolation there.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.