Hello @murtadha_s , here are some helpfult tips and hints to help you further diagnose the slowness.
Totally expected behavior here: object-storage moves with dbutils.fs.mv will be much slower than HDFS. Under the hood, dbutils isnโt doing an atomic rename โ itโs doing a full copy and then a delete. So when you move a large directory, it has to walk every file and move every byte, which is dramatically slower than HDFS, where a โmoveโ is basically just a metadata update.
What likely caused the slowness
A move in this world is really just a copy followed by a delete, even when everything lives in the same filesystem. So instead of a quick server-side rename, every single byte gets pushed through the ABFS connector. That alone changes the performance profile pretty dramatically.
On top of that, dbutils.fs recursive calls run single-threaded by default. If youโre dealing with a big directory tree full of tiny files, that one-lane highway becomes the bottleneck unless you intentionally parallelize the work.
When you layer in Unity Catalog volumes, you hit another limitation: many of these I/O operations still run on the driver, not executors. So bulk moves can get throttled simply because the driver is doing all the heavy lifting.
And if you step outside volumes and operate directly on abfss:// paths under external locations, you add one more tax to the system โ extra permission checks on every access. At low scale, no big deal. At high file counts, you definitely feel it.
Best-practice solutions (fastest to most practical)
If the source and destination live in the same ADLS Gen2 filesystem, the quickest path is to use Azureโs native server-side moves. The Azure CLI can do true fast renames without shuttling data through the cluster. Something like:
az storage fs directory move -n -f โnew-directory โ/โ โaccount-name โauth-mode login
Because this executes fully inside ADLS, itโs dramatically faster than dbutils.fs.mv, which has to copy data through the Spark cluster before cleaning up.
If you do need to run the move from Databricks, you can at least help yourself by enabling parallel recursive dbutils.fs operations so the driver can fan out the work:
# Enable parallel recursive cp/mv/rm on the driver
spark.conf.set("spark.databricks.service.dbutils.fs.parallel.enabled", True)
# Now run a recursive move
dbutils.fs.mv("/path/src/", "/path/dst/", True)
A couple things to keep in mind here:
- This parallel mode kicks in only when you call the operation from the driver with recurse=True. On large folder trees, it can give you an order-of-magnitude improvement.
- For volume paths, the picture changes a bit. Volumes donโt distribute dbutils.fs calls to executors, so some of the parallelization benefits can be muted.
- If youโre working with volumes, you may be better served by the Databricks Files REST API or the Databricks CLI (fs). Both are designed for file management on volumes and make it easier to build reliable scripted workflows.
- And avoid using shell-level moves like %sh mv for anything involving volumes โ they arenโt supported across volume boundaries. Stick with dbutils.fs operations or Azure-native moves when youโre working directly against abfss:// paths.
Practical decision guide
If youโre moving data within the same ADLS filesystem, your best bet is the Azure CLI server-side move. It performs a true fast rename and avoids pushing bytes through the cluster.
If youโre crossing containers, accounts, regions, or hopping between volume paths, things get heavier. When you can run the operation outside Databricks, lean on Azureโs own tooling โ CLI, Storage Explorer, anything that lets you parallelize and skip the data-tromboning back through the cluster.
If you need to stay inside Databricks, enable the parallel dbutils mode and use recursive moves. Always test on a small sample first so you get a feel for performance under your specific runtime, permissions, and volume constraints.
Why HDFS was faster
HDFS treats a move as a metadata rename, so itโs basically instant. Cloud storage plays by different rules. When you call dbutils.fs.mv, it isnโt doing a rename at all โ itโs doing a full copy followed by a delete. That means performance scales with every byte and every file, not just a quick metadata tweak.
Hope this helps you better understand what is going on.
Cheers, Lou.