cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

I would like to access S3 data in databricks

Karthe
New Contributor III

Hi all,

I am new to the databricks. I am trying to get the data from S3. The video tutoirals from the streaming platforms are accessing via access ID and secret access key. However, databricks is throwing a different options. I dont know what to fill here. Could you please explain or direct me to the right tutorials

# File location and type

file_location = "{{upload_location}}"

file_type = "{{file_type}}"

# CSV options

infer_schema = "{{infer_schema}}"

first_row_is_header = "{{first_row_is_header}}"

delimiter = "{{delimiter}}"

# The applied options are for CSV files. For other file types, these will be ignored.

df = spark.read.format(file_type) \

 .option("inferSchema", infer_schema) \

 .option("header", first_row_is_header) \

 .option("sep", delimiter) \

 .load(file_location)

display(df)

4 REPLIES 4

Mohit_m
Valued Contributor II

There are two ways in Databricks to read from S3. You can either read data using an IAM Role or read data using Access Keys.

you can find some examples here:

https://docs.databricks.com/_static/notebooks/data-import/s3.html

https://docs.databricks.com/administration-guide/cloud-configurations/aws/instance-profiles.html

Karthe
New Contributor III

Thank you Mohit, I think I find it still challenging because I am not clear on the fundamentals I believe. Let me try to figure out some other way. Thank you for sharing the answer.

AmanSehgal
Honored Contributor III

You can do following:

  1. use your AWS Secret Keys and Access Key to mount an S3 bucket to DBFS.
  2. Create an instance profile and access via that
  3. Use KMS in S3 bucket and then use the same KMS to mount bucket to DBFS

Vidula
Honored Contributor

Hi @Karthikeyan Palanisamy​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.