Databricks Community

mick042 · ‎06-10-2022

In a Databricks notebook, I need to run text files (stdin, stdout) through a function from an external library. I have used sparkContext.AddFiles({external_library_name}) to add the external library so that it is available to all executors.

when I run sparkFiles.get({external_library_name}) it returns the executor path to the external library. When I use that sparkFiles.get({external_library_name}) location as part of an Rdd.Pipe call with concatenated params, I get a FileNotFound exception.

spark.sparkContext.addFile("/dbfs/FileStore/Custom_Executable")
files_rdd  = spark.sparkContext.parallelize(files_list)
 
print(f'spark file path:{SparkFiles.get("Custom_Executable")}')
 
path_with_params =  SparkFiles.get("Custom_Executable") + " the-function-name  --to 
company1 --from  company2 -input - -output -"
 
print(f'path with params: {path_with_params}')
 
pipe_rdd = files_rdd.pipe(path_with_params,  env={'SOME_ENV_VAR': env_var_val})
print(pipe_tokenised_rdd.collect())

The output from this

spark file path:/local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles- 
e8a37109-046c-4909-8dd2-95bde5c9f3e3/Custom_Executable
exe path: /local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109- 
046c-4909-8dd2-95bde5c9f3e3/Custom_Executable the-function-name  --to company1 --from  
company2 -input - -output -
Output
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 
0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 12) 
(10.113.4.168 executor 0): org.apache.spark.api.python.PythonException: 
&#39;FileNotFoundError: [Errno 2] No such file or directory: &#39;/local_disk0/spark- 
c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109-046c-4909-8dd2- 
95bde5c9f3e3/Custom_Executable&#39;&#39;. Full traceback below:

Why is the pipe call not finding the location returned by SparkFiles.get?

mick042 · ‎06-14-2022

Thanks Kaniz, yes I tried that. Did not work. Falling back on init scripts now and that works.

View solution in original post

mick042 · ‎06-14-2022

Thanks Kaniz, yes I tried that. Did not work. Falling back on init scripts now and that works.

Databricks Community

Spark Executor Rdd.Pipe call not finding file location that exists in Sparkfiles.get()

Join Us as a Local Community Builder!

Lakehouse, Lagers & Legends — Bangalore Meetup | December 13

🌟 Community Pulse: Your Weekly Roundup! November 21 – 27, 2025

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐