03-08-2022 02:55 PM
code example
# a list of file path
list_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]
# copy all file above to this folder
dest_path=""/dbfs/mnt/..."
for file_path in list_files_path:
# copy function
copy_file(file_path, dest_path)
I am running it in the azure databrick and it works fine. But I am wondering if I can utilize the power of parallel of cluster in the databrick.
I know that I can run the some kind of multi-threading in the master node but I am wondering if I can use pandas_udf to take advantage of work nodes as well.
Thanks!
03-09-2022 02:55 AM
@Jackie Chan , To use spark parallelism you could register both destination as tables an use COPY INTO or register just source as table and use CREATE TABLE CLONE.
If you want to use normal copy it is better to use dbutils.fs library
If you want to copy regularly data between ADSL/blobs nothing can catch up with Azure Data Factory. There you can make copy pipeline, it will be cheapest and fastest. If you need depedency to tun databricks notebook before/after copy you can orchestrate it there (on successful run databricks notebook etc.) as databricks is integrated with ADF.
03-09-2022 02:55 AM
@Jackie Chan , To use spark parallelism you could register both destination as tables an use COPY INTO or register just source as table and use CREATE TABLE CLONE.
If you want to use normal copy it is better to use dbutils.fs library
If you want to copy regularly data between ADSL/blobs nothing can catch up with Azure Data Factory. There you can make copy pipeline, it will be cheapest and fastest. If you need depedency to tun databricks notebook before/after copy you can orchestrate it there (on successful run databricks notebook etc.) as databricks is integrated with ADF.
03-09-2022 11:04 PM
@Jackie Chan , Indeed ADF has massive throughput. So go for ADF if you want a plain copy (so no transformations).
04-27-2022 01:56 AM
Hi @Jackie Chan , Just a friendly follow-up. Do you still need help, or do the above responses help you find the solution? Please let us know.
04-27-2022 07:07 PM
@Jackie Chan , What's the data size you want to copy? If it's bigger, then use ADF.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.