cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
cancel
Showing results for 
Search instead for 
Did you mean: 

How to configure an AWS so that workspace databricks can only access the s3 acces point using VPC

gabriel_lazo
New Contributor II

My team requires a configuration so that a databricks workspace can connect to aws s3 access point through VPC and that other databricks workspaces cannot access it if they are not within the route table.
I have searched online, but I have only found configuring the S3 access point with one user or one EC2 instance.

Could you guide me how I can achieve this?

3 REPLIES 3

Kaniz
Community Manager
Community Manager

Hi @gabriel_lazo, Configuring Databricks to connect to an AWS S3 access point through a VPC while ensuring other Databricks workspaces cannot access it requires some careful setup. 

 

Let’s break it down:

 

Instance Profiles for S3 Access:

  • Recommended Approach: Use instance profiles to control data access to S3. You can load IAM roles as instance profiles in Databricks and attach them to clusters. This allows you to manage permissions effectively.
  • The AWS user who creates the IAM role must have permissions to create or update IAM roles, IAM policies, S3 buckets, and cross-account trust relationships.
  • The Databricks user who adds the IAM role as an instance profile in Databricks must be a workspace admin.
  • Once added, you can grant users, groups, or service principals permissions to launch clusters with the instance profile.
  • Protect access to the instance profile using both cluster access control and notebook access control....

Access S3 with URIs and AWS Keys:

  • Set Spark properties to configure AWS keys for S3 access.
  • Databricks recommends using secret scopes to store credentials securely.
  • Create a secret scope and grant users access to read it.
  • Set Spark properties in a cluster’s Spark configuration using the following snippet:AWS_SECRET_ACCESS_KEY={{secrets/scope/aws_secret_access_key}} AWS_ACCESS_KEY_ID={{secrets/scope/aws_access_key_id}}
  • Read from S3 using commands like:aws_bucket_name = "my-s3-bucket" df = spark.read.load(f"s3a://{aws_bucket_name}/flowers/delta/") display(df) dbutils.fs.ls(f"s3a://{aws_bucket_name}/")

Open-Source Hadoop Options:

VPC Endpoints and S3 Access Points:

If you need further assistance, feel free to ask! 🚀

gabriel_lazo
New Contributor II

Hello @Kaniz,

What are or where can I find the VPC endpoint policy to control access to the S3 bucket?

Best Regards!

Kaniz
Community Manager
Community Manager

Hey there! Thanks a bunch for being part of our awesome community! 🎉 

We love having you around and appreciate all your questions. Take a moment to check out the responses – you'll find some great info. Your input is valuable, so pick the best solution for you. And remember, if you ever need more help , we're here for you! 

Keep being awesome! 😊🚀

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.