Databricks Community

chanansh · ‎01-11-2023

I am trying to copy files from azure to s3. I've created a solution by comparing file lists and copy manually to a temp file and upload. However, I just found AutoLoader and I would like to use that https://docs.databricks.com/ingestion/auto-loader/index.html

The problem is, it is not clear from the documentation how to pass to the streamReader the azure blob storage credentials: tenant_id, container, account_url, client_id, client_secret and the azure_path.

What is the API to do that?

Hubert-Dudek · ‎01-11-2023

Copying files using a data factory can be cheaper and faster.

If you want access to Blob Storage / Azure Data Lake storage, you can also make a permanent mount in databricks. I described how to do it here https://community.databricks.com/s/feed/0D53f00001eQG.OHCA4

BigMF · ‎01-11-2023

Agreed, Azure Data Factory is definitely a better approach if all you are wanting to do is copy files to/from Azure Storage.

chanansh · ‎01-11-2023

I need it to update all the time so I need it to keep working continuously. Anyway I only have read permissions for the azure blob.

BigMF · ‎01-11-2023

ADF can be scheduled to run as often as needed or triggered based on files showing up in a container. However, based on your other statement below, it appears you are not working in an Azure environment and only have access to the storage container. I guess you could use Databricks to copy file but it seems wasteful. An analogy I would use is using a metal toolbox full of tools that are very useful for specific things and you use the box to hammer a nail in.

chanansh · ‎01-11-2023

Autoloader is the solution for me but I don't know how to set credentials

chanansh · ‎01-11-2023

I am not an azure user. I only have read permissions from the blob.

-werners- · ‎01-12-2023

You can also use AWS Data Pipeline.

What I have read is that we are talking about a plain copy, no transformations.

In that case firing up a spark cluster is way too much overhead, and way to expensive.

If you lack permissions to connect to the azure blob, I would try to fix that and not trying to find a way around by using Databricks.

chanansh · ‎01-15-2023

I want to use AutoLoader. I just need to know how to pass credentials to the StreamReader

Falokun · ‎01-20-2023

Just use tools like Goodsync and Gs Richcopy 360 to copy directly from blob to S3, I think you will never face problems like that

Databricks Community

copy files from azure to s3

li.media.uploader-dialog.title

Join Us as a Local Community Builder!

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.