05-17-2023 01:36 AM
As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command.
This takes hours as dbutils.fs.mv moves the files one at a time.
How can I speed this up?
05-17-2023 02:02 AM
@Dean Lovelace
You can use multithreading.
See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/
View solution in original post
never-displayed
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.