<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Spark Executor Rdd.Pipe call not finding file location that exists in Sparkfiles.get() in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/spark-executor-rdd-pipe-call-not-finding-file-location-that/m-p/17882#M11805</link>
    <description>&lt;P&gt;In a Databricks notebook, I need to run text files (stdin, stdout) through a function from an external library. I have used sparkContext.AddFiles({external_library_name}) to add the external library so that it is available to all executors.&lt;/P&gt;&lt;P&gt;when I run sparkFiles.get({external_library_name}) it returns the executor path to the external library. When I use that sparkFiles.get({external_library_name}) location as part of an Rdd.Pipe call with concatenated params, I get a FileNotFound exception.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sparkContext.addFile("/dbfs/FileStore/Custom_Executable")
files_rdd  = spark.sparkContext.parallelize(files_list)
&amp;nbsp;
print(f'spark file path:{SparkFiles.get("Custom_Executable")}')
&amp;nbsp;
path_with_params =  SparkFiles.get("Custom_Executable") + " the-function-name  --to 
company1 --from  company2 -input - -output -"
&amp;nbsp;
print(f'path with params: {path_with_params}')
&amp;nbsp;
pipe_rdd = files_rdd.pipe(path_with_params,  env={'SOME_ENV_VAR': env_var_val})
print(pipe_tokenised_rdd.collect())&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;  The output from this&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark file path:/local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles- 
e8a37109-046c-4909-8dd2-95bde5c9f3e3/Custom_Executable
exe path: /local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109- 
046c-4909-8dd2-95bde5c9f3e3/Custom_Executable the-function-name  --to company1 --from  
company2 -input - -output -
Output
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 
0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 12) 
(10.113.4.168 executor 0): org.apache.spark.api.python.PythonException: 
&amp;amp;#39;FileNotFoundError: [Errno 2] No such file or directory: &amp;amp;#39;/local_disk0/spark- 
c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109-046c-4909-8dd2- 
95bde5c9f3e3/Custom_Executable&amp;amp;#39;&amp;amp;#39;. Full traceback below:&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Why is the pipe call not finding the location returned by SparkFiles.get?&lt;/P&gt;</description>
    <pubDate>Fri, 10 Jun 2022 23:12:40 GMT</pubDate>
    <dc:creator>mick042</dc:creator>
    <dc:date>2022-06-10T23:12:40Z</dc:date>
    <item>
      <title>Spark Executor Rdd.Pipe call not finding file location that exists in Sparkfiles.get()</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-executor-rdd-pipe-call-not-finding-file-location-that/m-p/17882#M11805</link>
      <description>&lt;P&gt;In a Databricks notebook, I need to run text files (stdin, stdout) through a function from an external library. I have used sparkContext.AddFiles({external_library_name}) to add the external library so that it is available to all executors.&lt;/P&gt;&lt;P&gt;when I run sparkFiles.get({external_library_name}) it returns the executor path to the external library. When I use that sparkFiles.get({external_library_name}) location as part of an Rdd.Pipe call with concatenated params, I get a FileNotFound exception.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark.sparkContext.addFile("/dbfs/FileStore/Custom_Executable")
files_rdd  = spark.sparkContext.parallelize(files_list)
&amp;nbsp;
print(f'spark file path:{SparkFiles.get("Custom_Executable")}')
&amp;nbsp;
path_with_params =  SparkFiles.get("Custom_Executable") + " the-function-name  --to 
company1 --from  company2 -input - -output -"
&amp;nbsp;
print(f'path with params: {path_with_params}')
&amp;nbsp;
pipe_rdd = files_rdd.pipe(path_with_params,  env={'SOME_ENV_VAR': env_var_val})
print(pipe_tokenised_rdd.collect())&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;  The output from this&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;spark file path:/local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles- 
e8a37109-046c-4909-8dd2-95bde5c9f3e3/Custom_Executable
exe path: /local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109- 
046c-4909-8dd2-95bde5c9f3e3/Custom_Executable the-function-name  --to company1 --from  
company2 -input - -output -
Output
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 
0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 12) 
(10.113.4.168 executor 0): org.apache.spark.api.python.PythonException: 
&amp;amp;#39;FileNotFoundError: [Errno 2] No such file or directory: &amp;amp;#39;/local_disk0/spark- 
c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109-046c-4909-8dd2- 
95bde5c9f3e3/Custom_Executable&amp;amp;#39;&amp;amp;#39;. Full traceback below:&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Why is the pipe call not finding the location returned by SparkFiles.get?&lt;/P&gt;</description>
      <pubDate>Fri, 10 Jun 2022 23:12:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-executor-rdd-pipe-call-not-finding-file-location-that/m-p/17882#M11805</guid>
      <dc:creator>mick042</dc:creator>
      <dc:date>2022-06-10T23:12:40Z</dc:date>
    </item>
    <item>
      <title>Re: Spark Executor Rdd.Pipe call not finding file location that exists in Sparkfiles.get()</title>
      <link>https://community.databricks.com/t5/data-engineering/spark-executor-rdd-pipe-call-not-finding-file-location-that/m-p/17884#M11807</link>
      <description>&lt;P&gt;Thanks Kaniz, yes I tried that. Did not work. Falling back on init scripts now and that works.&lt;/P&gt;</description>
      <pubDate>Tue, 14 Jun 2022 12:12:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/spark-executor-rdd-pipe-call-not-finding-file-location-that/m-p/17884#M11807</guid>
      <dc:creator>mick042</dc:creator>
      <dc:date>2022-06-14T12:12:25Z</dc:date>
    </item>
  </channel>
</rss>

