I need to execute a .py file in Databricks from a notebook (with arguments which for simplicity i exclude here). For this i am using:
%sh script.py
script.py:
from pyspark import SparkContext
def main():
sc = SparkContext.getOrCreate()
print(sc)
if __name__ == "__main__":
main()
However, i need SparkContext in .py file and its suggested to use SparkContext.getOrCreate() but i get the exception that i need to set a master url.
pyspark.errors.exceptions.base.PySparkRuntimeError: [MASTER_URL_NOT_SET] A master URL must be set in your configuration.
But even if i set the master url, i get another exception. Now whats really weird is that if i run the same .py script directly in Databricks using the little play button it works. It also works if i open a web terminal of the cluster und execute my .py script in this bash shell. So using both approaches it works and i get the SparkContext. However this is obvious not very useful. In the %sh shell and in the web shell, user is root, same working directory and the python env is also not the problem.
The cluster i am using is a single node NC24ads_A100, so only a driver node and no additional worker nodes. I running DBR 14.2 ML and Spark 3.5.0.
Would be very happy to know whats so special about %sh or where my problem is or whats a workaround to execute .py files from a databricks notebooks with arguments and while staying/getting SparkContext.