saurabh18cs
Honored Contributor III

hi  why are you writing to intermediary first and not directly to your external data lake storage which is also blob storage.? with little bit of context i can say thisIs this for a single file copy or multi-files? 

From a parallelization perspective, you can use spark with a UDF. Create a dataframe with file paths as rows then run a UDF that will run shutil copy function for each path ( dbutils will not work within a UDF). That way the whole cluster will be used to parallelize file transfer (distributing the cpu, disk and network bandwidth usage). 

For single threaded driver side operation either shutil or dbutils can work. You can also do driver side multi-threading with asyncio, but you will be bounded only by the driver node capacity (+ network capacity). 

@yashojha