cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

PGP Encryption / Decryption in Databricks

SreedharVengala
New Contributor III

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault.

What libraries need to be used?

Any code snippets? Links?

18 REPLIES 18

Kaniz
Community Manager
Community Manager

TKD
New Contributor II

Hi @Kaniz Fatma​ ,

I used the code that you've given above to encrypt the file, however, I am facing some issues as below:

  1. I generated a key from azure portal (under key vault - > keys -> generate new key) which then I downloaded (download public key) and stored it into a secret which I retrieved in Databricks using dbutils.secrets.get and stored in a variable called publicb64. The file with public key in it I got was .PEM file and had beginning with data "-----BEGIN PUBLIC KEY----- a long alphanumeric string -----END PUBLIC KEY-----" . While executing pub_key.parse(KEY_PUB), it gives ValueError: Expected: ASCII-armored PGP.
  2. Used Kleopatra software to generate a PGP key pair which had a file format .asc and I uploaded it to a new secret in key vault. I used the same method to fetch this key in Databricks. It had a format as "-----BEGIN PGP PRIVATE KEY BLOCK--------- a long alphanumeric string -----END PGP PRIVATE KEY BLOCK----------" . With this, I am getting this error at the stage of encrypted_message = pub_key.encrypt(msg), and the error is "PGPError: Expected: is_public == True. Got: False"

I am looking out for the actual file format of a public key which we can use and the code can work. Is there a specific source you would like me to go to generate this public key file? Your advice on this will be a highly appreciated.

Edthehead
New Contributor III

This blog will help. https://medium.com/@anupamchand/pgp-encryption-using-python-in-azure-databricks-ef4bd56145ed. We used bash script within databricks to get this working. ​Bash is good good for large files. We tested upto 2GB and it worked fine. With plain python you will run into OOM errors.

Anonymous
Not applicable

I am looking for similar requirements to explore various options to encrypt/decrypt the ADLS data using ADB pyspark. Please share list of options available.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.