โ04-03-2022 08:21 AM
Encrypt and decrypt personal data with Spark Databricks.
We create a table that will include personal information. However, we want to hide personal identifiers so no one can see them.
We set a key. A key need to have 16, 24, or 32 bytes. 1 byte = 1 char.We use a widget for that. It is only for development purposes. In production, we should store that key in Key Vault.
We are inserting data into the table. Phone field we are encrypting using the aes_encrypt function. Since Databricks runtime 10.3, we can use aes_encrypt and aes_decrypt functions.
Now we can preview the data. Data without encrypting is unreadable.
We need to use the aes_decrypt function with our key when we want to read it.
๐ Please watch also my video about aes_encrypt and aes_decrypt:
โก๏ธ https://www.youtube.com/watch?v=OGLf_PiFMks
Link to GitHub with the above notebook: https://github.com/hubert-dudek/databricks-hubert/blob/main/linkedin/decrypt%20encrypt/decrypt%20enc...
โ04-03-2022 06:41 PM
@Hubert Dudekโ thank you for sharing this.
Is it possible to get this key from AWS KMS? Also, what are the other ways to efficienlty encrypt and decrypt data using multiple keys and salt - as encrypting using one key could pose a high risk. If it gets leaked then the entire column or db will be exposed.
โ04-04-2022 02:18 AM
Hi @Aman Sehgalโ
โ03-04-2023 07:43 PM
@Hubert Dudekโ
how can we decrypt the data outside of databricks with python? which is encrypted with aes_encrypt
โ03-06-2023 02:22 AM
Hi, yes, in any language supporting AES decryption. Here is an example for Java https://www.baeldung.com/java-aes-encryption-decryption
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group