cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

configure AWS authentication for serverless Spark

liu
New Contributor III

I only have an AWS Access Key ID and Secret Access Key, and I want to use this information to access S3.

However, the official documentation states that I need to set the AWS_SECRET_ACCESS_KEY and AWS_ACCESS_KEY_ID environment variables, but I cannot find a way to set these two environment variables.

documentation: Connect to Amazon S3 | Databricks on Google Cloud

7 REPLIES 7

szymon_dybczak
Esteemed Contributor III

Hi @liu ,

The proper way is to go to your cluster and in advanced section you can set them up. In that way they will be scoped at cluster level.  It's recommended to store values itself in a secret scopes as environment variables:

Use a secret in a Spark configuration property or environment variable | Databricks on AWS

szymon_dybczak_0-1759915139280.png

But you can also configure it at a notebook scope. I think following python snippet will be sufficient:

import os
os.environ["AWS_ACCESS_KEY_ID"] = "your-access-key"
os.environ["AWS_SECRET_ACCESS_KEY"] = "your-secret-key"

 

liu
New Contributor III

hi @szymon_dybczak 

Thank you very much for your answer,
But I can't use other cluster , I can only use serverless, can I set them for serverless?

I also configure it at a notebook scope,  but they are not working properly at the moment. I have been told that I do not have permission, and I am still investigating.

 

szymon_dybczak
Esteemed Contributor III

Sorry, somehow I didn't notice serveless in a thread title 😄 But I guess setting env variables at notebook scope should work. 
One question, is there any reason why can't you use UC? The above way is a depracted method of configuring storage.

liu
New Contributor III

@szymon_dybczak 

Sorry, I'm a newbie. Currently, I can only append S3 to external data through role. For your suggestion about using UC, can I convert the CSV file in S3 into a volume in UC only through id and key, or is there another method

szymon_dybczak
Esteemed Contributor III

@liu , I guess this could be related to Serveless limitation. In documentation they are saying that you must use Unity Catalog to connect to external data sources. Probably that's why you can connect 😕

 

szymon_dybczak_0-1759917884603.png

 

liu
New Contributor III

@szymon_dybczak 
Thank you very much for your answer

I will try my best to link the content of S3 with UC

Once more, thank you

szymon_dybczak
Esteemed Contributor III

No problem @liu , if you need some help with setting up UC we are here to help 🙂

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now