<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Run Pyspark job of Python egg package using spark submit on databricks in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13953#M8528</link>
    <description>&lt;P&gt;Thanks a lot for looking into this issue and providing above solution but ​my expected scenario is where I want to read main function .py file from .zip package (including number of py files). Can you please tell me how to pass main function python file or how will it take reference of that?&lt;/P&gt;</description>
    <pubDate>Tue, 12 Oct 2021 10:18:30 GMT</pubDate>
    <dc:creator>ItsMe</dc:creator>
    <dc:date>2021-10-12T10:18:30Z</dc:date>
    <item>
      <title>Run Pyspark job of Python egg package using spark submit on databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13949#M8524</link>
      <description>&lt;P&gt;Error: missing application resource&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Getting this error while running job with spark submit.​ I  have given following parameters while creating job:&lt;/P&gt;&lt;P&gt;--conf spark.yarn.appMasterEnv.PYSAPRK_PYTHON=databricks/path/python3&lt;/P&gt;&lt;P&gt;--py-files dbfs/path/to/.egg job_main.py&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Above given appropriately as per expectation given by databricks spark submit syntax.&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;Can anyone please let me know if anything missing while giving spark submit parameters?​&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;&lt;P&gt;​&lt;/P&gt;</description>
      <pubDate>Thu, 07 Oct 2021 06:58:11 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13949#M8524</guid>
      <dc:creator>ItsMe</dc:creator>
      <dc:date>2021-10-07T06:58:11Z</dc:date>
    </item>
    <item>
      <title>Re: Run Pyspark job of Python egg package using spark submit on databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13951#M8526</link>
      <description>&lt;P&gt;Hi @D M​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Good day.&lt;/P&gt;&lt;P&gt;Can you please try with &lt;/P&gt;&lt;P&gt;--py-files /dbfs/path/to/.egg job_main.py&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The above will invoke the fuse driver.&lt;/P&gt;&lt;P&gt;If the spark-submit still fails, Can you please provide the full stack trace?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Oct 2021 14:19:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13951#M8526</guid>
      <dc:creator>User16752246494</dc:creator>
      <dc:date>2021-10-11T14:19:44Z</dc:date>
    </item>
    <item>
      <title>Re: Run Pyspark job of Python egg package using spark submit on databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13952#M8527</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;We tried a simulate the question on our end and what we did was packaged a module inside a whl file.&lt;/P&gt;&lt;P&gt;Now to access the wheel file we created another python file test_whl_locally.py. Inside test_whl_locally.py to access the content of the wheel file first you have to impart the module or the class you want to access eg &lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;#Syntax
# from &amp;lt;wheelpackagedirname&amp;gt;.&amp;lt;module&amp;gt; import &amp;lt;className&amp;gt;
# refVar = &amp;lt;className&amp;gt;()
# example :
&amp;nbsp;
from somewhlpackage.module_two import ModuleTwo
&amp;nbsp;
moduleTwo = ModuleTwo()
moduleTwo.print()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;now upload both the wheel package in your case it will be the egg file and the calling python file (in our case it is test_whl_locally.py but in your case it is job_main.py) to dbfs. Once done configure your spark-submit.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;["--py-files","dbfs:/FileStore/tables/whls/somewhlpackage-1.0.0-py3-none-any.whl","dbfs:/FileStore/tables/whls/test_whl_locally.py"]&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;If you look closely we have provided the fully qualified path of dbfs in --py-files. so when py-files runs it will install both the wheel file/egg file in the virtual environment that it creates.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2416iED3E7BBAAF12D740/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Oct 2021 18:32:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13952#M8527</guid>
      <dc:creator>User16752246494</dc:creator>
      <dc:date>2021-10-11T18:32:45Z</dc:date>
    </item>
    <item>
      <title>Re: Run Pyspark job of Python egg package using spark submit on databricks</title>
      <link>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13953#M8528</link>
      <description>&lt;P&gt;Thanks a lot for looking into this issue and providing above solution but ​my expected scenario is where I want to read main function .py file from .zip package (including number of py files). Can you please tell me how to pass main function python file or how will it take reference of that?&lt;/P&gt;</description>
      <pubDate>Tue, 12 Oct 2021 10:18:30 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/run-pyspark-job-of-python-egg-package-using-spark-submit-on/m-p/13953#M8528</guid>
      <dc:creator>ItsMe</dc:creator>
      <dc:date>2021-10-12T10:18:30Z</dc:date>
    </item>
  </channel>
</rss>

