cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .

NandhaKumar
New Contributor II

I have created a databricks in azure. I have created a cluster for python 3. I am creating a job using spark-submit parameters. How to specify multiple files in --py-files in spark-submit command for databricks job? All the files to be specified in --py-files present in dbfs: .

["--packages","org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.1,com.databricks:spark-redshift_2.11:3.0.0-preview1","--py-files","dbfs:/FileStore/tables/configs.zip, dbfs:/FileStore/tables/libs.zip","dbfs:/FileStore/tables/read_batch.py"]

This command produces the following error.

201 artifacts copied, 0 already retrieved (104316kB/165ms) Exception in thread "main"

java.lang.IllegalArgumentException: java.net.URISyntaxException: Illegal character in scheme name at index 0: dbfs: at org.apache.hadoop.fs.Path.initialize(Path.java:205) at org.apache.hadoop.fs.Path.<init>(Path.java:171) at org.apache.hadoop.fs.Path.<init>(Path.java:93) at org.apache.hadoop.fs.Globber.glob(Globber.java:211) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1657) at org.apache.spark.deploy.DependencyUtils$.org$apache$spark$deploy$DependencyUtils$resolveGlobPath(DependencyUtils.scala:192) at org.apache.spark.deploy.DependencyUtils$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:147) at org.apache.spark.deploy.DependencyUtils$anonfun$resolveGlobPaths$2.apply(DependencyUtils.scala:145) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.TraversableLike$anonfun$flatMap$1.apply(TraversableLike.scala:241) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241) at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104) at org.apache.spark.deploy.DependencyUtils$.resolveGlobPaths(DependencyUtils.scala:145) at org.apache.spark.deploy.SparkSubmit$anonfun$prepareSubmitEnvironment$5.apply(SparkSubmit.scala:356) at org.apache.spark.deploy.SparkSubmit$anonfun$prepareSubmitEnvironment$5.apply(SparkSubmit.scala:356) at scala.Option.map(Option.scala:146) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:356) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:143) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86) at org.apache.spark.deploy.SparkSubmit$anon$2.doSubmit(SparkSubmit.scala:924) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.net.URISyntaxException: Illegal character in scheme name at index 0: dbfs: at java.net.URI$Parser.fail(URI.java:2848) at java.net.URI$Parser.checkChars(URI.java:3021) at java.net.URI$Parser.checkChar(URI.java:3031) at java.net.URI$Parser.parse(URI.java:3047) at java.net.URI.<init>(URI.java:746) at org.apache.hadoop.fs.Path.initialize(Path.java:202)

3 REPLIES 3

shyam_9
Valued Contributor
Valued Contributor

Hi @Nandha Kumar,

please go through the below docs to pass python files as job,

https://docs.databricks.com/dev-tools/api/latest/jobs.html#sparkpythontask

NandhaKumar
New Contributor II

Thanks @Shyamprasad Miryala​  . But how to give more than one files for --py-files which are present in DBFS ?

shyam_9
Valued Contributor
Valued Contributor

If you depend on multiple Python files we recommend packaging them into a .zip or .egg.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group