How to instantiate Databricks spark context in a python script?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2023 02:14 PM
I want to run a block of code in a script and not in a notebook on databricks, however I cannot properly instantiate the spark context without some error.
I have tried ` SparkContext.getOrCreate()`, but this does not work.
Is there a simple way to do this I am missing?
- Labels:
-
Python script
-
Sparkcontext
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-22-2023 08:29 PM
Hi @Andrej Erkelens ,
Can you please send error which you are getting while using above code?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-23-2023 09:11 AM
@Kaniz Fatma
Hi, I have tried this but receive an error
`RuntimeError: A master URL must be set in your configuration`
Is there something I am missing to use databricks cluster (AWS backend) in a py script?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-14-2023 02:21 PM
I have the same problem, and would be interested in a solution
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-12-2024 10:02 PM
Hi,
I tried doing this and get master URL and app name error. I tried setting those and get an error message that asks not to create a spark session in databricks and use sparkcontext.getOrCreate() method instead.
But that leads to the same error. I used the getActiveSession method to verify the python script does not have access to a spark session.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-05-2024 01:06 PM
Did this ever get addressed? I would like to use a Databricks notebook to launch a python-based child process (os.popen) that itself ultimately needs to use pyspark. When I try this, I either get told to supply a Master URL to the Spark context, or if I apply local[*] as master, I get told in an exception message on Spark interaction that Notebooks should use the shared context available via sc. This code is executing in a standalone python library being run by the subprocess (based on python, but not just a python script) launched from Notebook.
Is it simply disallowed to access Spark outside of the shared context sc? If so, how can we access that shared context from a standalone python library as I describe?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-07-2024 02:37 AM - edited 01-07-2024 02:41 AM
Thanks for information
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 03:51 AM
Try this pay attention to import
from pyspark.sql import SparkSession
appName = "abc"
# Create a SparkSession
spark = SparkSession.builder \
.appName(appName) \
.getOrCreate()
# Your PySpark code blah blah
# Stop the SparkSession when done
spark.stop()
London
United Kingdom
view my Linkedin profile
https://en.everybodywiki.com/Mich_Talebzadeh
Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 07:55 AM
Thanks @MichTalebzadeh , but I have tried this to no avail, I get [MASTER_URL_NOT_SET] error, and when I try to set it, I get an error stating I cannot create another spark session. the getActiveSession() returns null from within the script, but returns the session when called from the notebook.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 08:03 AM
Thanks. Please send the full detail of error you are getting
London
United Kingdom
view my Linkedin profile
https://en.everybodywiki.com/Mich_Talebzadeh
Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 10:03 AM
Sadly, I have the errors in my corporate environment and I can't show the exact error details from this account.But it is quite close to that of @testing3
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 08:31 AM
I came across a similar issue.
Please detail how you are executing the python script. Are you calling it from the web terminal? or from a notebook?
Note: If you are calling it from the web terminal - your spark session wont be passed. You could create a local variable and pass it in if youd like. I have never gotten to that point yet tho
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-13-2024 10:04 AM
I am running the script from databricks notebook : !streamlit run MyScript.py
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2024 07:47 AM
Hi,
I believe running a Streamlit app directly from a Databricks notebook using !streamlit run <python_code>.py is not the way to do it, because Databricks notebooks are not designed to host Streamlit
OK try below
Create a Databricks Runtime with Streamlit Pre-installed. Configure Cluster, ehen creating a new Databricks cluster, select a runtime that includes Streamlit pre-installed. This eliminates the installation step.
Run the script: Within the notebook cell, simply execute the script directly:
!streamlit run MyScript.py
HTH
apps.
London
United Kingdom
view my Linkedin profile
https://en.everybodywiki.com/Mich_Talebzadeh
Disclaimer: The information provided is correct to the best of my knowledge but of course cannot be guaranteed . It is essential to note that, as with any advice, quote "one test result is worth one-thousand expert opinions (Werner Von Braun)".
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-14-2024 10:57 AM
Hi @MichTalebzadeh ,
Once again, thanks for your replies.
My databricks cluster does come preinstalled with streamlit, and I have been running the script the way you mentioned.
I am going to try using alternatives to spark for the time being, and try with spark session isolation disabled as well.
I appreciate you taking out time to respond to this issue.

