04-24-2024 04:41 AM
I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:
%sh script.py
script.py:
from pyspark import SparkContext
def main():
sc = SparkContext.getOrCreate()
print(sc)
if __name__ == "__main__":
main()
However, i need SparkContext in .py file and its suggested to use SparkContext.getOrCreate() but i get the exception that i need to set a master url.
pyspark.errors.exceptions.base.PySparkRuntimeError: [MASTER_URL_NOT_SET] A master URL must be set in your configuration.
But even if i set the master url, i get another exception. Now whats really weird is that if i run the same .py script directly in Databricks using the little play button it works. It also works if i open a web terminal of the cluster und execute my .py script in this bash shell. So using both approaches it works and i get the SparkContext. However this is obvious not very useful. In the %sh shell and in the web shell, user is root, same working directory and the python env is also not the problem.
The cluster i am using is a single node NC24ads_A100, so only a driver node and no additional worker nodes. I running DBR 14.2 ML and Spark 3.5.0.
Would be very happy to know whats so special about %sh or where my problem is or whats a workaround to execute .py files from a databricks notebooks with arguments and while staying/getting SparkContext.
04-29-2024 03:15 AM
I got it eventually working with a combination of:
04-24-2024 11:37 PM
I think this occurs because one session is initiated within the Python script (.py file), while in the Databricks notebook, we have a pre-configured Spark session. It is important to note that we cannot use more than one Spark session per notebook, and each session should be unique.
04-25-2024 01:55 AM
Thanks for you answer. Thats also how i understand it. But is there a way to inject or connect to the pre-configured Spark session from within the Python script (.py file)?
04-29-2024 03:15 AM
I got it eventually working with a combination of:
04-29-2025 12:39 AM
Hey. How do you add arguments to this? My script uses argparse and i want to pass arguments like this --arg_name "value"? Thank you!
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now