- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2023 01:36 AM
As part of my batch processing I archive a large number of small files received from the source system each day using the dbutils.fs.mv command.
This takes hours as dbutils.fs.mv moves the files one at a time.
How can I speed this up?
- Labels:
-
Files
-
Small Files
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2023 02:02 AM
@Dean Lovelace
You can use multithreading.
See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-17-2023 02:02 AM
@Dean Lovelace
You can use multithreading.
See example here: https://nealanalytics.com/blog/databricks-spark-jobs-optimization-techniques-multi-threading/

