cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Accessing the S3 Files

Databricks3
Contributor

I am using the Unity Catalog Cluster. I have a requirement to read the files placed by the source team in a specific location (landing) in S3. I am already using a metastore pointing to a different bucket. Do I need to use an external location pointing to the landing bucket in S3? Additionally, how can I read the data from those files?

4 REPLIES 4

Anonymous
Not applicable

You have a couple of options to consider:

  1. External Location: You can create an external location in your Unity metastore that points to the landing bucket in S3. This allows Unity to access the files in that location without having to copy or move them to the default location managed by Unity. You can configure the external location using the Unity Catalog's administration tools or by using the Unity SDK/API.

    To create the external location, specify the S3 bucket and prefix (folder) where the files are located. Unity will be able to read the data directly from the specified S3 location without any data movement.

  2. Direct Read: Unity also provides the ability to directly read data from files in S3 without the need for an external location. In this approach, you can directly query the files in the S3 landing bucket using SQL or Spark commands. Unity will use its underlying query engine to perform distributed processing and retrieve the data from the S3 files.

    To read the data directly from the S3 landing bucket, you can use the Unity Catalog's SQL or Spark interfaces to interact with the data and perform the necessary operations like filtering, aggregating, or joining the datasets.

Databricks3
Contributor

If you could share an example of reading the file of both the cases it would be really helpful. 

Anonymous
Not applicable

Hi @Databricks3 

Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.

Cheers!

krikotti
New Contributor II

did anyone get any solution on this topic?  I am also facing the challenges reading the file from s3 using the boto3 with unity enabled cluster, created the s3 external location and granted the enough access. any help on this ?

same path and data accessible using the pyspark without any issues, 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group