Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Showing results forย 
Search instead forย 
Did you mean:ย 

How to instantiate Databricks spark context in a python script?

New Contributor III

I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.

I have tried ` SparkContext.getOrCreate()`, but this does not work.

Is there a simple way to do this I am missing?


Esteemed Contributor III

Hi @Andrej Erkelensโ€‹ ,

Can you please send error which you are getting while using above code?

Community Manager
Community Manager

Hi @Andrej Erkelensโ€‹, To instantiate a Spark context in a Python script that will run outside of a Databricks notebook, you can use the PySpark library, which provides an interface for interacting with Spark in Python.

Here's an example of how to instantiate a Spark context in a Python script:

from pyspark import SparkContext, SparkConf
# Set up Spark configuration
conf = SparkConf().setAppName("MyApp")
sc = SparkContext(conf=conf)
# Your Spark code here
# Stop the Spark context

In this example, we first import the SparkContext and SparkConf classes from the pyspark module.

We then create a new SparkConf object with an application name and pass it to the SparkContext constructor to create a new Spark context. You can then add your Spark code between the

SparkContext instantiation and the sc.stop() call at the end to execute your code.

It's important to note that when you instantiate a Spark context in a Python script, you'll need to explicitly manage the context's lifecycle, including starting and stopping it. This is because there is no automatic context management when running a Python script outside a Databricks notebook.

New Contributor III

@Kaniz Fatmaโ€‹ 

Hi, I have tried this but receive an error

`RuntimeError: A master URL must be set in your configuration`

Is there something I am missing to use databricks cluster (AWS backend) in a py script?


I have the same problem, and would be interested in a solution

New Contributor III

I tried doing this and get master URL and app name error. I tried setting those and get an error message that asks not to create a spark session in databricks and use sparkcontext.getOrCreate() method instead.
But that leads to the same error. I used the getActiveSession method to verify the python script does not have access to a spark session. 

New Contributor II

Did this ever get addressed? I would like to use a Databricks notebook to launch a python-based child process (os.popen) that itself ultimately needs to use pyspark. When I try this, I either get told to supply a Master URL to the Spark context, or if I apply local[*] as master, I get told in an exception message on Spark interaction that Notebooks should use the shared context available via sc. This code is executing in a standalone python library being run by the subprocess (based on python, but not just a python script) launched from Notebook.

Is it simply disallowed to access Spark outside of the shared context sc? If so, how can we access that shared context from a standalone python library as I describe?

New Contributor II

Thanks for information


Contributor III

Try this pay attention to import

from pyspark.sql import SparkSession
appName = "abc"

# Create a SparkSession
spark = SparkSession.builder \
    .appName(appName) \

# Your PySpark code blah blah

# Stop the SparkSession when done
Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
United Kingdom

view my Linkedin profile

Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Thanks @MichTalebzadeh , but I have tried this to no avail, I get [MASTER_URL_NOT_SET] error, and when I try to set it, I get an error stating I cannot create another spark session. the getActiveSession() returns null from within the script, but returns the session when called from the notebook.

Contributor III

Thanks. Please send the full detail of error you are getting

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
United Kingdom

view my Linkedin profile

Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Sadly, I have the errors in my corporate environment and I can't show the exact error details from this account.But it is quite close to that of @testing3 

Contributor III

I came across a similar issue. 

Please detail how you are executing the python script. Are you calling it from the web terminal? or from a notebook?

Note: If you are calling it from the web terminal - your spark session wont be passed. You could create a local variable and pass it in if youd like. I have never gotten to that point yet tho

New Contributor III

I am running the script from databricks notebook : !streamlit run


I believe running a Streamlit app directly from a Databricks notebook using !streamlit run <python_code>.py  is not the way to do it,  because Databricks notebooks are not designed to host Streamlit

OK try below

Create a Databricks Runtime with Streamlit Pre-installed. Configure Cluster, ehen creating a new Databricks cluster, select a runtime that includes Streamlit pre-installed. This eliminates the installation step.

Run the script: Within the notebook cell, simply execute the script directly:

!streamlit run



Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
United Kingdom

view my Linkedin profile

Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!