cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to instantiate Databricks spark context in a python script?

ae20cg
New Contributor III

I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.

I have tried ` SparkContext.getOrCreate()`, but this does not work.

Is there a simple way to do this I am missing?

17 REPLIES 17

Ajay-Pandey
Esteemed Contributor III

Hi @Andrej Erkelens​ ,

Can you please send error which you are getting while using above code?

Ajay Kumar Pandey

ae20cg
New Contributor III

@Kaniz Fatma​ 

Hi, I have tried this but receive an error

`RuntimeError: A master URL must be set in your configuration`

Is there something I am missing to use databricks cluster (AWS backend) in a py script?

Thanks

I have the same problem, and would be interested in a solution

Hi,
I tried doing this and get master URL and app name error. I tried setting those and get an error message that asks not to create a spark session in databricks and use sparkcontext.getOrCreate() method instead.
But that leads to the same error. I used the getActiveSession method to verify the python script does not have access to a spark session. 

testing3
New Contributor II

Did this ever get addressed? I would like to use a Databricks notebook to launch a python-based child process (os.popen) that itself ultimately needs to use pyspark. When I try this, I either get told to supply a Master URL to the Spark context, or if I apply local[*] as master, I get told in an exception message on Spark interaction that Notebooks should use the shared context available via sc. This code is executing in a standalone python library being run by the subprocess (based on python, but not just a python script) launched from Notebook.

Is it simply disallowed to access Spark outside of the shared context sc? If so, how can we access that shared context from a standalone python library as I describe?

mikifare
New Contributor II

Thanks for information

 

MichTalebzadeh
Valued Contributor

Try this pay attention to import

from pyspark.sql import SparkSession
appName = "abc"

# Create a SparkSession
spark = SparkSession.builder \
    .appName(appName) \
    .getOrCreate()

# Your PySpark code blah blah

# Stop the SparkSession when done
spark.stop()
Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Thanks @MichTalebzadeh , but I have tried this to no avail, I get [MASTER_URL_NOT_SET] error, and when I try to set it, I get an error stating I cannot create another spark session. the getActiveSession() returns null from within the script, but returns the session when called from the notebook.

MichTalebzadeh
Valued Contributor

Thanks. Please send the full detail of error you are getting

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Sadly, I have the errors in my corporate environment and I can't show the exact error details from this account.But it is quite close to that of @testing3 

Kaizen
Valued Contributor

I came across a similar issue. 

Please detail how you are executing the python script. Are you calling it from the web terminal? or from a notebook?

Note: If you are calling it from the web terminal - your spark session wont be passed. You could create a local variable and pass it in if youd like. I have never gotten to that point yet tho

Spartan101
New Contributor III

I am running the script from databricks notebook : !streamlit run MyScript.py

Hi,

I believe running a Streamlit app directly from a Databricks notebook using !streamlit run <python_code>.py  is not the way to do it,  because Databricks notebooks are not designed to host Streamlit

OK try below

Create a Databricks Runtime with Streamlit Pre-installed. Configure Cluster, ehen creating a new Databricks cluster, select a runtime that includes Streamlit pre-installed. This eliminates the installation step.

Run the script: Within the notebook cell, simply execute the script directly:

!streamlit run MyScript.py

HTH



apps.

Mich Talebzadeh | Technologist | Data | Generative AI | Financial Fraud
London
United Kingdom

view my Linkedin profile



https://en.everybodywiki.com/Mich_Talebzadeh



Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".

Hi @MichTalebzadeh ,

Once again, thanks for your replies.
My databricks cluster does come preinstalled with streamlit, and I have been running the script the way you mentioned.
I am going to try using alternatives to spark for the time being, and try with spark session isolation disabled as well.

I appreciate you taking out time to respond to this issue.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group