cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

PGP Encryption / Decryption in Databricks

SreedharVengala
New Contributor III

Is there a way to Decrypt / Encrypt Blob files in Databricks using Key stored in Key Vault.

What libraries need to be used?

Any code snippets? Links?

10 REPLIES 10

SreedharVengala
New Contributor III

Thanks Kaniz... we got this working now using gnupg within databricks notebook.

Regards

Sreedhar

Hi @Sreedhar Vengalaโ€‹  , we are trying the same can you share some sample code on how we can achieve the same, and did you pick the files from ADLS and placed back the encrypted files to ADLS folders

Databricks_PGP_
New Contributor III

Hi Team, Could anyone please help me on how to decrypt PGP keys using Azure Keyvault in Azure Databricks notebook.

@Kaniz Fatmaโ€‹ 

Could you please share the decryption script on how to decrypt PGP keys using Azure Keyvault in Azure Databricks notebook.

@Kaniz Fatmaโ€‹ 

We are looking to decrypt using Azure Databricks notebook...the above git hub link is using Azure functions and .net libraries.

Could you please help on the same ?

Alwin1
New Contributor II

@Kaniz Fatmaโ€‹ 

Is the key generated via OpenPGP (Private and Public) or Python?

Alwin1
New Contributor II

@Kaniz Fatmaโ€‹ 

Thank you.

Where is the key pairs were generated ? is it Python in Databricks or OpenPGP that created the key pairs?

TKD
New Contributor II

Hi @Kaniz Fatmaโ€‹ ,

I used the code that you've given above to encrypt the file, however, I am facing some issues as below:

  1. I generated a key from azure portal (under key vault - > keys -> generate new key) which then I downloaded (download public key) and stored it into a secret which I retrieved in Databricks using dbutils.secrets.get and stored in a variable called publicb64. The file with public key in it I got was .PEM file and had beginning with data "-----BEGIN PUBLIC KEY----- a long alphanumeric string -----END PUBLIC KEY-----" . While executing pub_key.parse(KEY_PUB), it gives ValueError: Expected: ASCII-armored PGP.
  2. Used Kleopatra software to generate a PGP key pair which had a file format .asc and I uploaded it to a new secret in key vault. I used the same method to fetch this key in Databricks. It had a format as "-----BEGIN PGP PRIVATE KEY BLOCK--------- a long alphanumeric string -----END PGP PRIVATE KEY BLOCK----------" . With this, I am getting this error at the stage of encrypted_message = pub_key.encrypt(msg), and the error is "PGPError: Expected: is_public == True. Got: False"

I am looking out for the actual file format of a public key which we can use and the code can work. Is there a specific source you would like me to go to generate this public key file? Your advice on this will be a highly appreciated.

This blog will help. https://medium.com/@anupamchand/pgp-encryption-using-python-in-azure-databricks-ef4bd56145ed. We used bash script within databricks to get this working. โ€‹Bash is good good for large files. We tested upto 2GB and it worked fine. With plain python you will run into OOM errors.

Anonymous
Not applicable

I am looking for similar requirements to explore various options to encrypt/decrypt the ADLS data using ADB pyspark. Please share list of options available.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group