cancel
Showing results for 
Search instead for 
Did you mean: 

encryption

Phani1
Contributor III

Hi Databricks, Could you please guide me on the below scenario?

Here is the use case we are trying to solve for

  1. Currently environment is using “Voltage” as an encryption tool for encrypting the data in S3 in conjunction with business business-provided data encryption & masking catalog.
  2. We are looking to replace the above tool, with inherent Databricks capabilities if possible and yet apply 256-bit AES encryption standards.
  • Scenario-
    1. Is it possible to apply data encryption at S3 bucket where Schema & Catalog is maintained by using Customer_managed Key - need to solve for this.
    2. Once encryption is done, we need to then read the data into Databricks DF using custom UDFs, where standard functions like- “aes_encrypt/ aes_decrypt” function could be used along with the data masking functionalities of Unity Catalog.

 

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @Phani1 , 

Let’s break down your scenario and address each part:

  1. Data Encryption at S3 Bucket Using Customer-Managed Key:

    • Databricks provides support for customer-managed keys (CMKs) to help protect and control access to encrypted data.
    • You can configure your own key to encrypt the data on the Amazon S3 bucket where your data resides.
    • Specifically, you can use a customer-managed key for workspace storage to encrypt your workspace’s root S3 bucket.
    • This encryption applies to data stored in your root S3 bucket, including DBFS storage and various Databricks artifacts.
    • Note that existing data is not re-encrypted; the key applies only to new data or data written after ...12.
  2. Using Customer-Managed Keys for Workspace Storage:

  3. Custom UDFs and AES Encryption:

    • Once your data is encrypted, you can read it into Databricks DataFrames (DFs) using custom UDFs.
    • For AES encryption, you can create your own UDFs that utilize standard functions like aes_encrypt and aes_decrypt.
    • These UDFs can be applied to your encrypted data within Databricks notebooks or jobs.
    • Additionally, you can leverage the data masking functionalities provided by Unity Catalog to further control access to sensitive data.

In summary, Databricks allows you to manage your own keys for workspace storage encryption, ensuring data security while providing flexibility for custom UDFs and other data processing tasks. If you encounter any specific challenges during implementation, feel free to seek further assistance...

Happy encrypting! 🛡🔒

View solution in original post

4 REPLIES 4

Kaniz
Community Manager
Community Manager

Hi @Phani1 , 

Let’s break down your scenario and address each part:

  1. Data Encryption at S3 Bucket Using Customer-Managed Key:

    • Databricks provides support for customer-managed keys (CMKs) to help protect and control access to encrypted data.
    • You can configure your own key to encrypt the data on the Amazon S3 bucket where your data resides.
    • Specifically, you can use a customer-managed key for workspace storage to encrypt your workspace’s root S3 bucket.
    • This encryption applies to data stored in your root S3 bucket, including DBFS storage and various Databricks artifacts.
    • Note that existing data is not re-encrypted; the key applies only to new data or data written after ...12.
  2. Using Customer-Managed Keys for Workspace Storage:

  3. Custom UDFs and AES Encryption:

    • Once your data is encrypted, you can read it into Databricks DataFrames (DFs) using custom UDFs.
    • For AES encryption, you can create your own UDFs that utilize standard functions like aes_encrypt and aes_decrypt.
    • These UDFs can be applied to your encrypted data within Databricks notebooks or jobs.
    • Additionally, you can leverage the data masking functionalities provided by Unity Catalog to further control access to sensitive data.

In summary, Databricks allows you to manage your own keys for workspace storage encryption, ensuring data security while providing flexibility for custom UDFs and other data processing tasks. If you encounter any specific challenges during implementation, feel free to seek further assistance...

Happy encrypting! 🛡🔒

DeborahRoe
New Contributor II

Thanks for the clear guidance! Integrating AWS KMS for CMK in Databricks and using custom UDFs for encryption/decryption align perfectly with our goals. Appreciate your assistance!

Kaniz
Community Manager
Community Manager

Hi @DeborahRoe , I want to express my gratitude for your effort in selecting the most suitable solution. It's great to hear that your query has been successfully resolved. Thank you for your contribution.




 

AliaCollier
New Contributor II

To replace "Voltage" with Databricks encryption, follow these steps: set up a Customer Managed Key in AWS, configure the S3 bucket, read data in Databricks, and implement custom UDFs for AES encryption/decryption.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.