SparkContext lost when running %sh script.py

madrhr
New Contributor III

I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:

%sh script.py

script.py:

from pyspark import SparkContext

def main():
    sc = SparkContext.getOrCreate()
    print(sc)

if __name__ == "__main__":
    main()

However, i need SparkContext in .py file and its suggested to use SparkContext.getOrCreate() but i get the exception that i need to set a master url. 

pyspark.errors.exceptions.base.PySparkRuntimeError: [MASTER_URL_NOT_SET] A master URL must be set in your configuration.

But even if i set the master url, i get another exception. Now whats really weird is that if i run the same .py script directly in Databricks using the little play button it works. It also works if i open a web terminal of the cluster und execute my .py script in this bash shell. So using both approaches it works and i get the SparkContext. However this is obvious not very useful. In the %sh shell and in the web shell, user is root, same working directory and the python env is also not the problem.

The cluster i am using is a single node NC24ads_A100, so only a driver node and no additional worker nodes. I running DBR 14.2 ML and Spark 3.5.0.

Would be very happy to know whats so special about %sh or where my problem is or whats a workaround to execute .py files from a databricks notebooks with arguments and while staying/getting SparkContext.