01-11-2023 09:46 AM
I am trying to copy files from azure to s3. I've created a solution by comparing file lists and copy manually to a temp file and upload. However, I just found AutoLoader and I would like to use that https://docs.databricks.com/ingestion/auto-loader/index.html
The problem is, it is not clear from the documentation how to pass to the streamReader the azure blob storage credentials: tenant_id, container, account_url, client_id, client_secret and the azure_path.
What is the API to do that?
01-11-2023 10:04 AM
Copying files using a data factory can be cheaper and faster.
If you want access to Blob Storage / Azure Data Lake storage, you can also make a permanent mount in databricks. I described how to do it here https://community.databricks.com/s/feed/0D53f00001eQG.OHCA4
01-11-2023 11:52 AM
Agreed, Azure Data Factory is definitely a better approach if all you are wanting to do is copy files to/from Azure Storage.
01-11-2023 11:56 AM
I need it to update all the time so I need it to keep working continuously. Anyway I only have read permissions for the azure blob.
01-11-2023 06:18 PM
ADF can be scheduled to run as often as needed or triggered based on files showing up in a container. However, based on your other statement below, it appears you are not working in an Azure environment and only have access to the storage container. I guess you could use Databricks to copy file but it seems wasteful. An analogy I would use is using a metal toolbox full of tools that are very useful for specific things and you use the box to hammer a nail in.
01-11-2023 09:58 PM
Autoloader is the solution for me but I don't know how to set credentials
01-11-2023 11:55 AM
I am not an azure user. I only have read permissions from the blob.
01-12-2023 02:35 AM
You can also use AWS Data Pipeline.
What I have read is that we are talking about a plain copy, no transformations.
In that case firing up a spark cluster is way too much overhead, and way to expensive.
If you lack permissions to connect to the azure blob, I would try to fix that and not trying to find a way around by using Databricks.
01-15-2023 03:36 AM
I want to use AutoLoader. I just need to know how to pass credentials to the StreamReader
01-20-2023 05:06 AM
Just use tools like Goodsync and Gs Richcopy 360 to copy directly from blob to S3, I think you will never face problems like that
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group