cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How to instantiate Databricks spark context in a python script?

ae20cg
New Contributor III

I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.

I have tried ` SparkContext.getOrCreate()`, but this does not work.

Is there a simple way to do this I am missing?

17 REPLIES 17

Ajay-Pandey
Esteemed Contributor III

Hi @Andrej Erkelens​ ,

Can you please send error which you are getting while using above code?

Kaniz
Community Manager
Community Manager

Hi @Andrej Erkelens​, To instantiate a Spark context in a Python script that will run outside of a Databricks notebook, you can use the PySpark library, which provides an interface for interacting with Spark in Python.

Here's an example of how to instantiate a Spark context in a Python script:

from pyspark import SparkContext, SparkConf
 
# Set up Spark configuration
conf = SparkConf().setAppName("MyApp")
sc = SparkContext(conf=conf)
 
# Your Spark code here
 
# Stop the Spark context
sc.stop()

In this example, we first import the SparkContext and SparkConf classes from the pyspark module.

We then create a new SparkConf object with an application name and pass it to the SparkContext constructor to create a new Spark context. You can then add your Spark code between the

SparkContext instantiation and the sc.stop() call at the end to execute your code.

It's important to note that when you instantiate a Spark context in a Python script, you'll need to explicitly manage the context's lifecycle, including starting and stopping it. This is because there is no automatic context management when running a Python script outside a Databricks notebook.

ae20cg
New Contributor III

@Kaniz Fatma​ 

Hi, I have tried this but receive an error

`RuntimeError: A master URL must be set in your configuration`

Is there something I am missing to use databricks cluster (AWS backend) in a py script?

Thanks

I have the same problem, and would be interested in a solution

Spartan101
New Contributor III

Hi,
I tried doing this and get master URL and app name error. I tried setting those and get an error message that asks not to create a spark session in databricks and use sparkcontext.getOrCreate() method instead.
But that leads to the same error. I used the getActiveSession method to verify the python script does not have access to a spark session. 

testing3
New Contributor II

Did this ever get addressed? I would like to use a Databricks notebook to launch a python-based child process (os.popen) that itself ultimately needs to use pyspark. When I try this, I either get told to supply a Master URL to the Spark context, or if I apply local[*] as master, I get told in an exception message on Spark interaction that Notebooks should use the shared context available via sc. This code is executing in a standalone python library being run by the subprocess (based on python, but not just a python script) launched from Notebook.

Is it simply disallowed to access Spark outside of the shared context sc? If so, how can we access that shared context from a standalone python library as I describe?

mikifare
New Contributor II

Thanks for information

 

MichTalebzadeh
Contributor

Try this pay attention to import

from pyspark.sql import SparkSession
appName = "abc"

# Create a SparkSession
spark = SparkSession.builder \
    .appName(appName) \
    .getOrCreate()

# Your PySpark code blah blah

# Stop the SparkSession when done
spark.stop()
Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Thanks @MichTalebzadeh , but I have tried this to no avail, I get [MASTER_URL_NOT_SET] error, and when I try to set it, I get an error stating I cannot create another spark session. the getActiveSession() returns null from within the script, but returns the session when called from the notebook.

MichTalebzadeh
Contributor

Thanks. Please send the full detail of error you are getting

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Sadly, I have the errors in my corporate environment and I can't show the exact error details from this account.But it is quite close to that of @testing3 

Kaizen
Contributor III

I came across a similar issue. 

Please detail how you are executing the python script. Are you calling it from the web terminal? or from a notebook?

Note: If you are calling it from the web terminal - your spark session wont be passed. You could create a local variable and pass it in if youd like. I have never gotten to that point yet tho

Spartan101
New Contributor III

I am running the script from databricks notebook : !streamlit run MyScript.py

Hi,

I believe running a Streamlit app directly from a Databricks notebook using !streamlit run <python_code>.py  is not the way to do it,  because Databricks notebooks are not designed to host Streamlit

OK try below

Create a Databricks Runtime with Streamlit Pre-installed. Configure Cluster, ehen creating a new Databricks cluster, select a runtime that includes Streamlit pre-installed. This eliminates the installation step.

Run the script: Within the notebook cell, simply execute the script directly:

!streamlit run MyScript.py

HTH



apps.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.