cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Spark Executor Rdd.Pipe call not finding file location that exists in Sparkfiles.get()

mick042
New Contributor III

In a Databricks notebook, I need to run text files (stdin, stdout) through a function from an external library. I have used sparkContext.AddFiles({external_library_name}) to add the external library so that it is available to all executors.

when I run sparkFiles.get({external_library_name}) it returns the executor path to the external library. When I use that sparkFiles.get({external_library_name}) location as part of an Rdd.Pipe call with concatenated params, I get a FileNotFound exception.

spark.sparkContext.addFile("/dbfs/FileStore/Custom_Executable")
files_rdd  = spark.sparkContext.parallelize(files_list)
 
print(f'spark file path:{SparkFiles.get("Custom_Executable")}')
 
path_with_params =  SparkFiles.get("Custom_Executable") + " the-function-name  --to 
company1 --from  company2 -input - -output -"
 
print(f'path with params: {path_with_params}')
 
pipe_rdd = files_rdd.pipe(path_with_params,  env={'SOME_ENV_VAR': env_var_val})
print(pipe_tokenised_rdd.collect())

The output from this

spark file path:/local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles- 
e8a37109-046c-4909-8dd2-95bde5c9f3e3/Custom_Executable
exe path: /local_disk0/spark-c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109- 
046c-4909-8dd2-95bde5c9f3e3/Custom_Executable the-function-name  --to company1 --from  
company2 -input - -output -
Output
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 
0.0 failed 4 times, most recent failure: Lost task 2.3 in stage 0.0 (TID 12) 
(10.113.4.168 executor 0): org.apache.spark.api.python.PythonException: 
'FileNotFoundError: [Errno 2] No such file or directory: '/local_disk0/spark- 
c69e5328-9da3-4c76-85b8-a977e470909d/userFiles-e8a37109-046c-4909-8dd2- 
95bde5c9f3e3/Custom_Executable''. Full traceback below:

Why is the pipe call not finding the location returned by SparkFiles.get?

1 ACCEPTED SOLUTION

Accepted Solutions

mick042
New Contributor III

Thanks Kaniz, yes I tried that. Did not work. Falling back on init scripts now and that works.

View solution in original post

1 REPLY 1

mick042
New Contributor III

Thanks Kaniz, yes I tried that. Did not work. Falling back on init scripts now and that works.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group