<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: . in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27569#M19435</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .&lt;/P&gt;
&lt;P&gt;&lt;B&gt;["--packages","org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.1,com.databricks:spark-redshift_2.11:3.0.0-preview1","--py-files","dbfs:/FileStore/tables/configs.zip, dbfs:/FileStore/tables/libs.zip","dbfs:/FileStore/tables/read_batch.py"]&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;This command produces the following error. 
&lt;P&gt;201 artifacts copied, 0 already retrieved (104316kB/165ms) Exception in thread "main" &lt;/P&gt;
&lt;P&gt;java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: dbfs: at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.&amp;lt;init&amp;gt;(Path.java:171) at org.apache.hadoop.fs.Path.&amp;lt;init&amp;gt;(Path.java:93) at org.apache.hadoop.fs.Globber.glob(Globber.java:211) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657) at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$resolveGlobPath(DependencyUtils.scala:192) at org.apache.spark.deploy.DependencyUtils$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147) at org.apache.spark.deploy.DependencyUtils$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145) at org.apache.spark.deploy.SparkSubmit$anonfun$prepareSubmitEnvironment$5.apply(SparkSubmit.scala:356) at org.apache.spark.deploy.SparkSubmit$anonfun$prepareSubmitEnvironment$5.apply(SparkSubmit.scala:356) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:356) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: dbfs: at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.checkChars(URI.java:3021) at java.net.URI$Parser.checkChar(URI.java:3031) at java.net.URI$Parser.parse(URI.java:3047) at java.net.URI.&amp;lt;init&amp;gt;(URI.java:746) at org.apache.hadoop.fs.Path.initialize(Path.java:202)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 14 Nov 2019 10:34:57 GMT</pubDate>
    <dc:creator>NandhaKumar</dc:creator>
    <dc:date>2019-11-14T10:34:57Z</dc:date>
    <item>
      <title>How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27569#M19435</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .&lt;/P&gt;
&lt;P&gt;&lt;B&gt;["--packages","org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.1,com.databricks:spark-redshift_2.11:3.0.0-preview1","--py-files","dbfs:/FileStore/tables/configs.zip, dbfs:/FileStore/tables/libs.zip","dbfs:/FileStore/tables/read_batch.py"]&lt;/B&gt;&lt;/P&gt;
&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;This command produces the following error. 
&lt;P&gt;201 artifacts copied, 0 already retrieved (104316kB/165ms) Exception in thread "main" &lt;/P&gt;
&lt;P&gt;java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: dbfs: at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.&amp;lt;init&amp;gt;(Path.java:171) at org.apache.hadoop.fs.Path.&amp;lt;init&amp;gt;(Path.java:93) at org.apache.hadoop.fs.Globber.glob(Globber.java:211) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657) at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$resolveGlobPath(DependencyUtils.scala:192) at org.apache.spark.deploy.DependencyUtils$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147) at org.apache.spark.deploy.DependencyUtils$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145) at org.apache.spark.deploy.SparkSubmit$anonfun$prepareSubmitEnvironment$5.apply(SparkSubmit.scala:356) at org.apache.spark.deploy.SparkSubmit$anonfun$prepareSubmitEnvironment$5.apply(SparkSubmit.scala:356) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:356) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: dbfs: at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.checkChars(URI.java:3021) at java.net.URI$Parser.checkChar(URI.java:3031) at java.net.URI$Parser.parse(URI.java:3047) at java.net.URI.&amp;lt;init&amp;gt;(URI.java:746) at org.apache.hadoop.fs.Path.initialize(Path.java:202)&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 14 Nov 2019 10:34:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27569#M19435</guid>
      <dc:creator>NandhaKumar</dc:creator>
      <dc:date>2019-11-14T10:34:57Z</dc:date>
    </item>
    <item>
      <title>Re: How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27570#M19436</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hi @Nandha Kumar,&lt;/P&gt;&lt;P&gt;please go through the below docs to pass python files as job,&lt;/P&gt;&lt;P&gt;&lt;A href="https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask" target="_blank"&gt;https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2019 05:46:20 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27570#M19436</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2019-11-18T05:46:20Z</dc:date>
    </item>
    <item>
      <title>Re: How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27571#M19437</link>
      <description>&lt;P&gt;Thanks @Shyamprasad Miryala​&amp;nbsp; . But how to give &lt;B&gt;&lt;I&gt;more than one&lt;/I&gt;&lt;/B&gt; files for &lt;B&gt;&lt;I&gt;--py-files&lt;/I&gt;&lt;/B&gt; which are present in DBFS ?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 18 Nov 2019 05:57:39 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27571#M19437</guid>
      <dc:creator>NandhaKumar</dc:creator>
      <dc:date>2019-11-18T05:57:39Z</dc:date>
    </item>
    <item>
      <title>Re: How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27572#M19438</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;If you depend on multiple Python files we recommend packaging them into a .zip or .egg.&lt;/P&gt;</description>
      <pubDate>Thu, 19 Dec 2019 10:47:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-specify-multiple-files-in-py-files-in-spark-submit/m-p/27572#M19438</guid>
      <dc:creator>shyam_9</dc:creator>
      <dc:date>2019-12-19T10:47:25Z</dc:date>
    </item>
  </channel>
</rss>

