cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Error when running python script on databricks

170017
New Contributor II

I have the following basic script that works fine using pycharm on my machine.

from pyspark.sql import SparkSession

print("START")

spark = SparkSession \

.Builder() \

.appName("myapp") \

.master('local[*, 4]') \

.getOrCreate()

print(spark)

data = [('James', '', 'Smith', '1991-04-01', 'M', 3000),

('Michael', 'Rose', '', '2000-05-19', 'M', 4000),

('Robert', '', 'Williams', '1978-09-05', 'M', 4000),

('Maria', 'Anne', 'Jones', '1967-12-01', 'F', 4000),

('Jen', 'Mary', 'Brown', '1980-02-17', 'F', -1)

]

columns = ["firstname", "middlename", "lastname", "dob", "gender", "salary"]

df = spark.createDataFrame(data=data, schema=columns)

print(df)

However when trying to run on a databricks cluster, directly through python script it gives an error.

START Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Workspace/Repos/***********/sdk_test/tests/snippets/spark_tests.py", line 13, in class SparkTests: File "/Workspace/Repos/*******/sdk_test/tests/snippets/spark_tests.py", line 16, in SparkTests sc = SparkContext.getOrCreate() File "/databricks/spark/python/pyspark/context.py", line 400, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/databricks/spark/python/pyspark/context.py", line 147, in init self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, File "/databricks/spark/python/pyspark/context.py", line 192, in _do_init raise RuntimeError("A master URL must be set in your configuration") RuntimeError: A master URL must be set in your configuration CalledProcessError: Command 'b'cd ../\n\n/databricks/python3/bin/python -m tests.snippets.spark_tests\n# python -m tests.runner --env=qa --runtime_env=databricks --upload=True --package=sdk\n'' returned non-zero exit status 1.

What am I missing?

1 REPLY 1

Vidula
Honored Contributor

Hi @Patricia Mayer​ 

Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.

Thanks!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group