<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Spark Error when running python script on databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-error-when-running-python-script-on-databricks/m-p/14406#M8900</link>
    <description>&lt;P&gt;I have the following basic script that works fine using pycharm on my machine.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;/P&gt;&lt;P&gt;print("START")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark = SparkSession \&lt;/P&gt;&lt;P&gt;    .Builder() \&lt;/P&gt;&lt;P&gt;    .appName("myapp") \&lt;/P&gt;&lt;P&gt;    .master('local[*, 4]') \&lt;/P&gt;&lt;P&gt;    .getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;print(spark)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data = [('James', '', 'Smith', '1991-04-01', 'M', 3000),&lt;/P&gt;&lt;P&gt;        ('Michael', 'Rose', '', '2000-05-19', 'M', 4000),&lt;/P&gt;&lt;P&gt;        ('Robert', '', 'Williams', '1978-09-05', 'M', 4000),&lt;/P&gt;&lt;P&gt;        ('Maria', 'Anne', 'Jones', '1967-12-01', 'F', 4000),&lt;/P&gt;&lt;P&gt;        ('Jen', 'Mary', 'Brown', '1980-02-17', 'F', -1)&lt;/P&gt;&lt;P&gt;        ]&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;columns = ["firstname", "middlename", "lastname", "dob", "gender", "salary"]&lt;/P&gt;&lt;P&gt;df = spark.createDataFrame(data=data, schema=columns)&lt;/P&gt;&lt;P&gt;print(df)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However when trying to run on a databricks cluster, directly through python script it gives an error.&lt;/P&gt;&lt;P&gt;START Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Workspace/Repos/***********/sdk_test/tests/snippets/spark_tests.py", line 13, in class SparkTests: File "/Workspace/Repos/*******/sdk_test/tests/snippets/spark_tests.py", line 16, in SparkTests sc = SparkContext.getOrCreate() File "/databricks/spark/python/pyspark/context.py", line 400, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/databricks/spark/python/pyspark/context.py", line 147, in&amp;nbsp;&lt;B&gt;init&lt;/B&gt;&amp;nbsp;self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, File "/databricks/spark/python/pyspark/context.py", line 192, in _do_init raise RuntimeError("A master URL must be set in your configuration") RuntimeError: A master URL must be set in your configuration CalledProcessError: Command 'b'cd ../\n\n/databricks/python3/bin/python -m tests.snippets.spark_tests\n# python -m tests.runner --env=qa --runtime_env=databricks --upload=True --package=sdk\n'' returned non-zero exit status 1.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What am I missing?&lt;/P&gt;</description>
    <pubDate>Thu, 07 Jul 2022 11:15:40 GMT</pubDate>
    <dc:creator>170017</dc:creator>
    <dc:date>2022-07-07T11:15:40Z</dc:date>
    <item>
      <title>Spark Error when running python script on databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-error-when-running-python-script-on-databricks/m-p/14406#M8900</link>
      <description>&lt;P&gt;I have the following basic script that works fine using pycharm on my machine.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;from pyspark.sql import SparkSession&lt;/P&gt;&lt;P&gt;print("START")&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;spark = SparkSession \&lt;/P&gt;&lt;P&gt;    .Builder() \&lt;/P&gt;&lt;P&gt;    .appName("myapp") \&lt;/P&gt;&lt;P&gt;    .master('local[*, 4]') \&lt;/P&gt;&lt;P&gt;    .getOrCreate()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;print(spark)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;data = [('James', '', 'Smith', '1991-04-01', 'M', 3000),&lt;/P&gt;&lt;P&gt;        ('Michael', 'Rose', '', '2000-05-19', 'M', 4000),&lt;/P&gt;&lt;P&gt;        ('Robert', '', 'Williams', '1978-09-05', 'M', 4000),&lt;/P&gt;&lt;P&gt;        ('Maria', 'Anne', 'Jones', '1967-12-01', 'F', 4000),&lt;/P&gt;&lt;P&gt;        ('Jen', 'Mary', 'Brown', '1980-02-17', 'F', -1)&lt;/P&gt;&lt;P&gt;        ]&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;columns = ["firstname", "middlename", "lastname", "dob", "gender", "salary"]&lt;/P&gt;&lt;P&gt;df = spark.createDataFrame(data=data, schema=columns)&lt;/P&gt;&lt;P&gt;print(df)&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;However when trying to run on a databricks cluster, directly through python script it gives an error.&lt;/P&gt;&lt;P&gt;START Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/Workspace/Repos/***********/sdk_test/tests/snippets/spark_tests.py", line 13, in class SparkTests: File "/Workspace/Repos/*******/sdk_test/tests/snippets/spark_tests.py", line 16, in SparkTests sc = SparkContext.getOrCreate() File "/databricks/spark/python/pyspark/context.py", line 400, in getOrCreate SparkContext(conf=conf or SparkConf()) File "/databricks/spark/python/pyspark/context.py", line 147, in&amp;nbsp;&lt;B&gt;init&lt;/B&gt;&amp;nbsp;self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer, File "/databricks/spark/python/pyspark/context.py", line 192, in _do_init raise RuntimeError("A master URL must be set in your configuration") RuntimeError: A master URL must be set in your configuration CalledProcessError: Command 'b'cd ../\n\n/databricks/python3/bin/python -m tests.snippets.spark_tests\n# python -m tests.runner --env=qa --runtime_env=databricks --upload=True --package=sdk\n'' returned non-zero exit status 1.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What am I missing?&lt;/P&gt;</description>
      <pubDate>Thu, 07 Jul 2022 11:15:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-error-when-running-python-script-on-databricks/m-p/14406#M8900</guid>
      <dc:creator>170017</dc:creator>
      <dc:date>2022-07-07T11:15:40Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Error when running python script on databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-error-when-running-python-script-on-databricks/m-p/14408#M8902</link>
      <description>&lt;P&gt;Hi @Patricia Mayer​&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 01 Sep 2022 08:19:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-error-when-running-python-script-on-databricks/m-p/14408#M8902</guid>
      <dc:creator>Vidula</dc:creator>
      <dc:date>2022-09-01T08:19:35Z</dc:date>
    </item>
  </channel>
</rss>

