โ05-17-2023 01:36 AM
As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command.
This takes hours as dbutils.fs.mv moves the files one at a time.
How can I speed this up?
โ05-17-2023 02:02 AM
@Dean Lovelaceโ
You can use multithreading.
See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/
View solution in original post
never-displayed
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!