Databricks Community

ggsmith · ‎08-14-2024

I have around 20 pgp files in a folder in my volume that I need to decrypt. I have a decryption function that accepts a file name and writes the decrypted file to a new folder in the same volume.

I had thought I could create a spark dataframe with the name of each file and then create a udf from my decryption function and apply it to each file. I believed this approach would allow parallel decryption of the files. They are very large so it takes time doing it linearly.

The issue I have is that even though the file is there and I can run the process on an individual file with no issues, when trying to apply the udf to the dataframe of file names, I get an 'PermissionError: [Errno 13] Permission denied: "Path_to_file.pgp"' message.

Is this not something I can do with spark?

Brahmareddy · ‎08-14-2024

Spark's error happens because the worker nodes can't access your local files. Instead of using Spark to decrypt, try doing it outside of Spark using Python's multiprocessing or a simple batch script for parallel processing. Another option is to move the files to something like HDFS or S3 so all nodes can access them. Spark works best for processing data, not handling files directly.

View solution in original post

Brahmareddy · ‎08-14-2024

Spark's error happens because the worker nodes can't access your local files. Instead of using Spark to decrypt, try doing it outside of Spark using Python's multiprocessing or a simple batch script for parallel processing. Another option is to move the files to something like HDFS or S3 so all nodes can access them. Spark works best for processing data, not handling files directly.

Databricks Community

Question: Decrypt many files with UDF

Join Us as a Local Community Builder!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

Databricks Community Champion - September 2025 - Nayanjyoti Sonowal

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟

Solution Accelerator Series | #4 - Toxicity Detection for Gaming