cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

mount s3 bucket with specific endpoint

impulsleistung
New Contributor III

Environment:

  • AZURE-Databricks
  • Language: Python

I can access my s3 bucket via:

boto3.client('s3', endpoint_url='https://gateway.storjshare.io', ... )

and it also works via:

boto3.resource('s3', endpoint_url='https://gateway.storjshare.io', ... )

As a next step, I want to mount this S3 with the specific endpoint in AZURE-Databricks, but there is not even an option for that.

How do I have to write the mount routine in the notebook?

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

In AWS Console, in "My security credentials," please generate a new access key and secret key,

Set them as env variables:

sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ACCESS_KEY)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", SECRET_KEY)

Now you can read files from your S3 bucket directly

 df = spark.read.csv("https://gateway.storjshare.io/test.csv”", header=True, inferSchema=True)

you can as well mount a bucket permanently using that command

dbutils.fs.mount(f"s3a://{ACCESS_KEY}:{SECRET_KEY}@{aws_bucket_name}", f"/mnt/{mount_name}")

It is safer to use a key vault to store your access key and secret key

This won't work. I'm using AZURE-Databricks and I want to read/write objects from/to an S3 bucket with a specific endpoint → endpoint_url='https://gateway.storjshare.io'

So this is not a I/O operation from Databricks to AWS. In addition, this is actually important because the Azure-Datafactory only support reading and NOT writing back. So far, there's no user-friendly way to do so.

Kaniz
Community Manager
Community Manager

Hi @Kevin Ostheimer​ ​, We haven’t heard from you since the last response from @Hubert Dudek​, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

impulsleistung
New Contributor III

Hi! I just tried, I'm on AZURE and the endpoint is proprietary, s. my reply

Anonymous
Not applicable

Hi @Kevin Ostheimer​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.