05-17-2023 01:36 AM
As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command.
This takes hours as dbutils.fs.mv moves the files one at a time.
How can I speed this up?
05-17-2023 02:02 AM
@Dean Lovelace
You can use multithreading.
See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/
View solution in original post
never-displayed