cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Encrypt and decrypt personal data with Spark Databricks.We create a table that will include personal information. However, we want to hide personal id...

Hubert-Dudek
Esteemed Contributor III

Encrypt and decrypt personal data with Spark Databricks.

image.pngWe create a table that will include personal information. However, we want to hide personal identifiers so no one can see them.

image.pngWe set a key. A key need to have 16, 24, or 32 bytes. 1 byte = 1 char.We use a widget for that. It is only for development purposes. In production, we should store that key in Key Vault.

image.pngWe are inserting data into the table. Phone field we are encrypting using the aes_encrypt function. Since Databricks runtime 10.3, we can use aes_encrypt and aes_decrypt functions.

image.pngNow we can preview the data. Data without encrypting is unreadable.

image.pngWe need to use the aes_decrypt function with our key when we want to read it.

image👉 Please watch also my video about aes_encrypt and aes_decrypt:

➡️ https://www.youtube.com/watch?v=OGLf_PiFMks

image.pngLink to GitHub with the above notebook: https://github.com/hubert-dudek/databricks-hubert/blob/main/linkedin/decrypt%20encrypt/decrypt%20enc...

4 REPLIES 4

AmanSehgal
Honored Contributor III

@Hubert Dudek​  thank you for sharing this.

Is it possible to get this key from AWS KMS? Also, what are the other ways to efficienlty encrypt and decrypt data using multiple keys and salt - as encrypting using one key could pose a high risk. If it gets leaked then the entire column or db will be exposed.

Hubert-Dudek
Esteemed Contributor III

Hi @Aman Sehgal​ 

  • Yes, it is possible to get a key from the key vault. I am using Azure key vault for that. However, just code doesn't look as lovely as in the above example as I need to use dbutils secrets and mix python with SQL.
  • Multiple keys and salt - I think you need your own script/implementation for that.
  • The perfect solution would be to encrypt the whole column on the table creation level (like in Synapse dedicated SQL).
  • I know there will be some improvements to handle PID data once the unity catalog is released (lineage, classification). However, I think the whole topic in databricks is relatively young.

MaheshDBR
New Contributor II

@Hubert Dudek​ 

how can we decrypt the data outside of databricks with python? which is encrypted with aes_encrypt

Hubert-Dudek
Esteemed Contributor III

Hi, yes, in any language supporting AES decryption. Here is an example for Java https://www.baeldung.com/java-aes-encryption-decryption

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group