Databricks Community

chanansh · ‎01-11-2023

I am trying to copy files from azure to s3. I've created a solution by comparing file lists and copy manually to a temp file and upload. However, I just found AutoLoader and I would like to use that https://docs.databricks.com/ingestion/auto-loader/index.html

The problem is, it is not clear from the documentation how to pass to the streamReader the azure blob storage credentials: tenant_id, container, account_url, client_id, client_secret and the azure_path.

What is the API to do that?

Hubert-Dudek · ‎01-11-2023

Copying files using a data factory can be cheaper and faster.

If you want access to Blob Storage / Azure Data Lake storage, you can also make a permanent mount in databricks. I described how to do it here https://community.databricks.com/s/feed/0D53f00001eQG.OHCA4

My blog: https://databrickster.medium.com/

BigMF · ‎01-11-2023

Agreed, Azure Data Factory is definitely a better approach if all you are wanting to do is copy files to/from Azure Storage.

chanansh · ‎01-11-2023

I need it to update all the time so I need it to keep working continuously. Anyway I only have read permissions for the azure blob.

BigMF · ‎01-11-2023

ADF can be scheduled to run as often as needed or triggered based on files showing up in a container. However, based on your other statement below, it appears you are not working in an Azure environment and only have access to the storage container. I guess you could use Databricks to copy file but it seems wasteful. An analogy I would use is using a metal toolbox full of tools that are very useful for specific things and you use the box to hammer a nail in.

chanansh · ‎01-11-2023

Autoloader is the solution for me but I don't know how to set credentials

chanansh · ‎01-11-2023

I am not an azure user. I only have read permissions from the blob.

-werners- · ‎01-12-2023

You can also use AWS Data Pipeline.

What I have read is that we are talking about a plain copy, no transformations.

In that case firing up a spark cluster is way too much overhead, and way to expensive.

If you lack permissions to connect to the azure blob, I would try to fix that and not trying to find a way around by using Databricks.

chanansh · ‎01-15-2023

I want to use AutoLoader. I just need to know how to pass credentials to the StreamReader

Falokun · ‎01-20-2023

Just use tools like Goodsync and Gs Richcopy 360 to copy directly from blob to S3, I think you will never face problems like that

Databricks Community

copy files from azure to s3

Databricks AMER Learning Festival | Virtual Training

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics

🌟 Community Pulse: Your Weekly Roundup! July 13 – 19, 2026

Solution Accelerator Series | Social Determinants of Health

Upcoming Community BrickTalk | Sports Analytics: Turning Tracking Data into Real-Time AI Decisions