cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How can I start SparkSession out of Notebook?

NCat
New Contributor III

Hi community,

How can I start SparkSession out of Notebook?
I want to split my Notebook into small Python modules, and I want to let some of them to call Spark functionality.

7 REPLIES 7

-werners-
Esteemed Contributor III

can you elaborate a bit more?
Are you going to call those modules in a notebook, and want to use spark functions in them?
Or do you want to explicitly start a separate sparksession for each module?

sakhulaz
New Contributor II

Hello,

To start a SparkSession outside of a notebook, you can follow these steps to split your code into small Python modules and utilize Spark functionality:

  1. Import Required Libraries: In your Python module, import the necessary libraries for Spark:

In your Python module, import the necessary libraries for Spark:

from pyspark.sql import SparkSession
  1. Create SparkSession:

Initialize the SparkSession at the beginning of your module:

spark = SparkSession.builder \
    .appName("YourAppName") \
    .config("spark.some.config.option", "config-value") \
    .getOrCreate()

Customize the configuration options as needed.

Tharun-Kumar
Databricks Employee
Databricks Employee

@NCat 

Databricks provides Spark Session out of the box. You have to just use the variable "spark". 

Screenshot 2023-08-09 at 5.52.07 PM.png

In order to use it in other modules, you have to pass the spark variable as a parameter to the other modules.

NCat
New Contributor III

Thank you for all replies.
@-werners- I want to use Spark Session in modules which is called from Notebook.

@sakhulaz How can I get the config options to attach to the Databricks data?

@Tharun-Kumar Thank you. That approach definitely works for my situation!

-werners-
Esteemed Contributor III

in general (as already stated) a notebook automatically gets a sparksession.
You don't have to do anything.
If you specifically need to have separate sessions (isolation), you should run different notebooks (or plan different jobs) as these get a new session (a session per notebook/job).
Calling magic functions like %scala, %run etc use the same sparksession, so no isolation there.

benrich
New Contributor II

To start a SparkSession outside of a Jupyter Notebook and enable its use in multiple Python modules, follow these steps:

  1. Install Apache Spark: Ensure Spark is installed on your system. You can download it from the Apache Spark website  and set it up with Hadoop or use a standalone cluster.

  2. Set Up Environment Variables: Configure the necessary environment variables (SPARK_HOME, JAVA_HOME, and PYTHONPATH) to point to the correct locations.

  3. Create a Spark Configuration Module: Create a Python file (e.g., spark_config.py) to set up the SparkSession:

    python
    Copy code
    from pyspark.sql import SparkSession def create_spark_session(app_name="MyApp"😞 spark = SparkSession.builder \ .appName(app_name) \ .getOrCreate() return spark
  4. Initialize SparkSession in Your Modules: Import and use the create_spark_session function in your Python modules to get the SparkSession:

    python
    Copy code
    from spark_config import create_spark_session spark = create_spark_session("ModuleName") # Now you can use Spark functionality, e.g.: df = spark.read.csv("path/to/data.csv") df.show()
  5. Run Your Modules: Execute your Python scripts or modules from the command line or within a larger application, and the Spark session will be initialized and used as needed.

benrich

jacovangelder
Honored Contributor

Just overtake Databricks sparksession.

from pyspark.sql import SparkSession
spark = SparkSession.getActiveSession()

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group