Encrypt and decrypt personal data with Spark Databricks.We create a table that will include personal information. However, we want to hide personal id...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2022 08:21 AM
Encrypt and decrypt personal data with Spark Databricks.
We create a table that will include personal information. However, we want to hide personal identifiers so no one can see them.
We set a key. A key need to have 16, 24, or 32 bytes. 1 byte = 1 char.We use a widget for that. It is only for development purposes. In production, we should store that key in Key Vault.
We are inserting data into the table. Phone field we are encrypting using the aes_encrypt function. Since Databricks runtime 10.3, we can use aes_encrypt and aes_decrypt functions.
Now we can preview the data. Data without encrypting is unreadable.
We need to use the aes_decrypt function with our key when we want to read it.
👉 Please watch also my video about aes_encrypt and aes_decrypt:
➡️ https://www.youtube.com/watch?v=OGLf_PiFMks
Link to GitHub with the above notebook: https://github.com/hubert-dudek/databricks-hubert/blob/main/linkedin/decrypt%20encrypt/decrypt%20enc...
- Labels:
-
Spark databricks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-03-2022 06:41 PM
@Hubert Dudek thank you for sharing this.
Is it possible to get this key from AWS KMS? Also, what are the other ways to efficienlty encrypt and decrypt data using multiple keys and salt - as encrypting using one key could pose a high risk. If it gets leaked then the entire column or db will be exposed.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-04-2022 02:18 AM
Hi @Aman Sehgal
- Yes, it is possible to get a key from the key vault. I am using Azure key vault for that. However, just code doesn't look as lovely as in the above example as I need to use dbutils secrets and mix python with SQL.
- Multiple keys and salt - I think you need your own script/implementation for that.
- The perfect solution would be to encrypt the whole column on the table creation level (like in Synapse dedicated SQL).
- I know there will be some improvements to handle PID data once the unity catalog is released (lineage, classification). However, I think the whole topic in databricks is relatively young.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-04-2023 07:43 PM
@Hubert Dudek
how can we decrypt the data outside of databricks with python? which is encrypted with aes_encrypt
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-06-2023 02:22 AM
Hi, yes, in any language supporting AES decryption. Here is an example for Java https://www.baeldung.com/java-aes-encryption-decryption