Databricks Community

ggsmith · ‎08-14-2024

I have around 20 pgp files in a folder in my volume that I need to decrypt. I have a decryption function that accepts a file name and writes the decrypted file to a new folder in the same volume.

I had thought I could create a spark dataframe with the name of each file and then create a udf from my decryption function and apply it to each file. I believed this approach would allow parallel decryption of the files. They are very large so it takes time doing it linearly.

The issue I have is that even though the file is there and I can run the process on an individual file with no issues, when trying to apply the udf to the dataframe of file names, I get an 'PermissionError: [Errno 13] Permission denied: "Path_to_file.pgp"' message.

Is this not something I can do with spark?

Brahmareddy · ‎08-14-2024

Spark's error happens because the worker nodes can't access your local files. Instead of using Spark to decrypt, try doing it outside of Spark using Python's multiprocessing or a simple batch script for parallel processing. Another option is to move the files to something like HDFS or S3 so all nodes can access them. Spark works best for processing data, not handling files directly.

View solution in original post

Brahmareddy · ‎08-14-2024

Spark's error happens because the worker nodes can't access your local files. Instead of using Spark to decrypt, try doing it outside of Spark using Python's multiprocessing or a simple batch script for parallel processing. Another option is to move the files to something like HDFS or S3 so all nodes can access them. Spark works best for processing data, not handling files directly.

Databricks Community

Question: Decrypt many files with UDF

Photos

Join Us as a Local Community Builder!

Exciting Opportunity to Collaborate with Us!

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Share Your Thoughts on Databricks & Get Rewarded!

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

Virtual Learning Festival: 9 April - 30 April