Databricks Community

christys · ‎05-14-2021

Taha · ‎06-23-2021

So if you've got an S3 bucket with your data in it, the first thing you'll need to do is connect it to a Databricks workspace to grant access. Then you can start querying the contents of the bucket from notebooks (or running jobs) by using clusters (compute resources) within the Databricks workspace to execute commands.

Here's a guide on the docs site that walks through the process to connect a bucket: https://docs.databricks.com/data/data-sources/aws/amazon-s3.html

Although it shares several options, I'd recommend using instance profiles and mounting via DBFS for simplicity.

Databricks Community

What is the most efficient way to start an S3 bucket?

Connect with Databricks Users in Your Area

Jumpstart Your Data Journey with Databricks Get Started Days!

Databricks DevConnect: Global Community Meetups for Data Engineers

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Introducing SAP Databricks

Databricks Clean Rooms: Now Generally Available on AWS and Azure