cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

PGP Encryption / Decryption in Databricks

SreedharVengala
New Contributor III

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault.

What libraries need to be used?

Any code snippets? Links?

18 REPLIES 18

Kaniz
Community Manager
Community Manager

Hi @ SreedharVengala! My name is Kaniz, and I'm the technical moderator here. Great to meet you, and thanks for your question! Let's see if your peers on the Forum have an answer to your questions first. Or else I will follow up shortly with a response.

SreedharVengala
New Contributor III

Thanks Kaniz... we got this working now using gnupg within databricks notebook.

Regards

Sreedhar

That's awesome!

Enjoy!

Hi @Sreedhar Vengala​  , we are trying the same can you share some sample code on how we can achieve the same, and did you pick the files from ADLS and placed back the encrypted files to ADLS folders

Hi @Sreedhar Vengala​ , Would you like to share the sample code here? It shall help other users.

Databricks_PGP_
New Contributor III

Hi Team, Could anyone please help me on how to decrypt PGP keys using Azure Keyvault in Azure Databricks notebook.

Hi @Shobhit Awasthi​ , You can use the below snippet of code which doesn't need you to access anything from any directory other than the files you plan to encrypt and the keys. In the code, you have a public key stored in the key vault as a secret named 'publicb64'. If you want to read the ASC version from somewhere you can just read it into KEY_PUB. Don't forget to install pgpy using pip install pgpy.

#Encrypting a file using public key
import pgpy
from pgpy.constants import PubKeyAlgorithm, KeyFlags, HashAlgorithm, SymmetricKeyAlgorithm, CompressionAlgorithm
from timeit import default_timer as timer
import base64 
import io
 
KEY_PUB = base64.b64decode(publicb64).decode("ascii").lstrip()  
#print(KEY_PUB)
 
pub_key = pgpy.PGPKey()
pub_key.parse(KEY_PUB)
pass
# -READ THE FILE FROM MOUNT POINT-----------------
with io.open('/dbfs/mnt/sample_data/california_housing_test.csv', "r",newline='') as csv_file:
    input_data = csv_file.read()                   # The io and newline retains the CRLF
    
t0 = timer()
#PGP Encryption start
msg = pgpy.PGPMessage.new(input_data)
###### this returns a new PGPMessage that contains an encrypted form of the original message
encrypted_message = pub_key.encrypt(msg)
pgpstr = str(encrypted_message)
with open('/dbfs/mnt/sample_data/california_housing_test.csv.pgp', "w") as text_file:
    text_file.write(pgpstr)
print("Encryption Complete :" + str(timer()-t0)) 

@Kaniz Fatma​ 

Could you please share the decryption script on how to decrypt PGP keys using Azure Keyvault in Azure Databricks notebook.

Kaniz
Community Manager
Community Manager

Hi @Shobhit Awasthi​ , This might help you.

https://github.com/lfalck/AzureFunctionsPGPDecrypt.

@Kaniz Fatma​ 

We are looking to decrypt using Azure Databricks notebook...the above git hub link is using Azure functions and .net libraries.

Could you please help on the same ?

Kaniz
Community Manager
Community Manager

Alwin1
New Contributor II

@Kaniz Fatma​ 

Is the key generated via OpenPGP (Private and Public) or Python?

Kaniz
Community Manager
Community Manager

Hi @Alwin Lee​ , It is a Private Key, created alongside with public key and passphrase at the time of encryption.

Source

Alwin1
New Contributor II

@Kaniz Fatma​ 

Thank you.

Where is the key pairs were generated ? is it Python in Databricks or OpenPGP that created the key pairs?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.