Databricks Community

impulsleistung · ‎10-23-2022

Environment:

AZURE-Databricks
Language: Python

I can access my s3 bucket via:

boto3.client('s3', endpoint_url='https://gateway.storjshare.io', ... )

and it also works via:

boto3.resource('s3', endpoint_url='https://gateway.storjshare.io', ... )

As a next step, I want to mount this S3 with the specific endpoint in AZURE-Databricks, but there is not even an option for that.

How do I have to write the mount routine in the notebook?

Hubert-Dudek · ‎10-23-2022

In AWS Console, in "My security credentials," please generate a new access key and secret key,

Set them as env variables:

sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", ACCESS_KEY)
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", SECRET_KEY)

Now you can read files from your S3 bucket directly

 df = spark.read.csv("https://gateway.storjshare.io/test.csv”", header=True, inferSchema=True)

you can as well mount a bucket permanently using that command

dbutils.fs.mount(f"s3a://{ACCESS_KEY}:{SECRET_KEY}@{aws_bucket_name}", f"/mnt/{mount_name}")

It is safer to use a key vault to store your access key and secret key

impulsleistung · ‎10-25-2022

This won't work. I'm using AZURE-Databricks and I want to read/write objects from/to an S3 bucket with a specific endpoint → endpoint_url='https://gateway.storjshare.io'

So this is not a I/O operation from Databricks to AWS. In addition, this is actually important because the Azure-Datafactory only support reading and NOT writing back. So far, there's no user-friendly way to do so.

impulsleistung · ‎10-25-2022

Hi! I just tried, I'm on AZURE and the endpoint is proprietary, s. my reply

Anonymous · ‎11-27-2022

Hi @Kevin Ostheimer

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.

We'd love to hear from you.

Thanks!