cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Executor Rdd.Pipe call not finding file location that exists in Sparkfiles.get()

mick042
New Contributor III

In a Databricks notebook, I need to run text files (stdin, stdout) through a function from an external library. I have used sparkContext.AddFiles({external_library_name}) to add the external library so that it is available to all executors.

when I run sparkFiles.get({external_library_name}) it returns the executor path to the external library. When I use that sparkFiles.get({external_library_name}) location as part of an Rdd.Pipe call with concatenated params, I get a FileNotFound exception.

spark.sparkContext.addFile("/dbfs/FileStore/Custom_Executable")
files_rdd  = spark.sparkContext.parallelize(files_list)
 
print(f'spark file path:{SparkFiles.get("Custom_Executable")}')
 
path_with_params =  SparkFiles.get("Custom_Executable") + " the-function-name  --to 
company1 --from  company2 -input - -output -"
 
print(f'path with params: {path_with_params}')
 
pipe_rdd = files_rdd.pipe(path_with_params,  env={'SOME_ENV_VAR': env_var_val})
print(pipe_tokenised_rdd.collect())

The output from this

spark file path:/local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles- 
e8a37109-046c-4909-8dd2-95bde5c9f3e3/Custom_Executable
exe path: /local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109- 
046c-4909-8dd2-95bde5c9f3e3/Custom_Executable the-function-name  --to company1 --from  
company2 -input - -output -
Output
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 
0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 12) 
(10.113.4.168 executor 0): org.apache.spark.api.python.PythonException: 
'FileNotFoundError: [Errno 2] No such file or directory: '/local_disk0/spark- 
c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109-046c-4909-8dd2- 
95bde5c9f3e3/Custom_Executable''. Full traceback below:

Why is the pipe call not finding the location returned by SparkFiles.get?

1 ACCEPTED SOLUTION

Accepted Solutions

mick042
New Contributor III

Thanks Kaniz, yes I tried that. Did not work. Falling back on init scripts now and that works.

View solution in original post

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @Michael Lennon​, Try adding "file:///" before SparkFiles.get................... on line 6.

mick042
New Contributor III

Thanks Kaniz, yes I tried that. Did not work. Falling back on init scripts now and that works.

Kaniz
Community Manager
Community Manager

@Michael Lennon​ , Awesome!

Thanks for sharing the update.

Would you mind marking your answer as the best?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.