04-03-2022 08:21 AM
Encrypt and decrypt personal data with Spark Databricks.
We create a table that will include personal information. However, we want to hide personal identifiers so no one can see them.
We set a key. A key need to have 16, 24, or 32 bytes. 1 byte = 1 char.We use a widget for that. It is only for development purposes. In production, we should store that key in Key Vault.
We are inserting data into the table. Phone field we are encrypting using the aes_encrypt function. Since Databricks runtime 10.3, we can use aes_encrypt and aes_decrypt functions.
Now we can preview the data. Data without encrypting is unreadable.
We need to use the aes_decrypt function with our key when we want to read it.
👉 Please watch also my video about aes_encrypt and aes_decrypt:
➡️ https://www.youtube.com/watch?v=OGLf_PiFMks
Link to GitHub with the above notebook: https://github.com/hubert-dudek/databricks-hubert/blob/main/linkedin/decrypt%20encrypt/decrypt%20enc...
04-03-2022 06:41 PM
@Hubert Dudek thank you for sharing this.
Is it possible to get this key from AWS KMS? Also, what are the other ways to efficienlty encrypt and decrypt data using multiple keys and salt - as encrypting using one key could pose a high risk. If it gets leaked then the entire column or db will be exposed.
04-04-2022 02:18 AM
Hi @Aman Sehgal
03-04-2023 07:43 PM
@Hubert Dudek
how can we decrypt the data outside of databricks with python? which is encrypted with aes_encrypt
03-06-2023 02:22 AM
Hi, yes, in any language supporting AES decryption. Here is an example for Java https://www.baeldung.com/java-aes-encryption-decryption
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group