cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

I/F security about using medallion architecture

ShanQiwei
New Contributor

Iโ€™m new to writing requirement definitions, and Iโ€™d like to ask a question about interface (I/F) security.

My question is:
Do I need to define the authentication and security mechanisms (such as OAuth2, Managed Identity, Service Principals, etc.) between the systems shown below? Or do I also need to define security between the bronze, silver, and gold layers within the lakehouse?

Our data pipeline is:
VPC on AWS (client system) โ†’ S3 โ†’ Lakehouse (bronze โ†’ silver โ†’ gold) โ†’ Serverless compute

2 REPLIES 2

ManojkMohan
Honored Contributor II

@ShanQiwei 

Interface or LayerShould You Define Security?Typical MechanismsReference Links
VPC โ†’ S3YesIAM roles, service accounts, credentials, policieshttps://docs.aws.amazon.com/AmazonS3/latest/userguide/security-best-practices.html
S3 โ†’ LakehouseYesService principals, managed identities, access keyshttps://docs.databricks.com/security/access-control/service-principals.html
Lakehouse Bronze โ†’ Silver โ†’ GoldSometimes (Context-Driven)Platform roles, catalog permissions, ACLs, data maskinghttps://docs.databricks.com/data-governance/unity-catalog/index.html
Lakehouse โ†’ Serverless ComputeYesManaged identities, OAuth2, tokens, ACLshttps://learn.microsoft.com/en-us/azure/architecture/serverless/security-serverless-applications

Coffee77
Contributor III

I'll try to summarize and go directly to the key points as I see this:

- Client to S3 ๐Ÿ‘‰ SAS Token or OAUTH 2.0 with Service to Service authentication (preferred)

- Databricks to S3 ๐Ÿ‘‰ Use Service Principal or Managed Identities (preferred)

- Bronze/Silver/Gold ๐Ÿ‘‰ Create different catalogs per layer or different schemas/databases per catalog to place bronze, silver and gold layers. All of them under Unity Catalog governance. Then, you can set proper permissions for users, groups or service principals depending on layer they should be allowed to interact with.

- Serverless cluster ๐Ÿ‘‰ You can set in "permissions" who can access and how. Establish as needed.


Lifelong Learner Cloud & Data Solution Architect | https://www.youtube.com/@CafeConData