Databricks Community

christys · ‎05-14-2021

Taha · ‎06-23-2021

So if you've got an S3 bucket with your data in it, the first thing you'll need to do is connect it to a Databricks workspace to grant access. Then you can start querying the contents of the bucket from notebooks (or running jobs) by using clusters (compute resources) within the Databricks workspace to execute commands.

Here's a guide on the docs site that walks through the process to connect a bucket: https://docs.databricks.com/data/data-sources/aws/amazon-s3.html

Although it shares several options, I'd recommend using instance profiles and mounting via DBFS for simplicity.

Databricks Community

What is the most efficient way to start an S3 bucket?

Photos

Join Us as a Local Community Builder!

Announcing the APJ Databricks Smart Business Insights Challenge: Empowering Data-Driven Decision Mak

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Business Intelligence in the Era of AI

Virtual Learning Festival: 9 April - 30 April

Data + AI Summit 2025 — registration now open!