speed up a for loop in python (azure databrick)

Jackie · ‎03-08-2022

code example

# a list of file path

list_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]

# copy all file above to this folder

dest_path=""/dbfs/mnt/..."

for file_path in list_files_path:

# copy function

copy_file(file_path, dest_path)

I am running it in the azure databrick and it works fine. But I am wondering if I can utilize the power of parallel of cluster in the databrick.

I know that I can run the some kind of multi-threading in the master node but I am wondering if I can use pandas_udf to take advantage of work nodes as well.

Thanks!