Databricks Community

yashojha · ‎02-19-2026

Hi All,

I am using managed volumes as an intermediate storage to write a decrypted file before moving to data lake storage. Strangely the write operation is taking a lot of time (22 mins) to write a small file to volumes and it takes only few seconds to decrypt and move the same file.

The same operation is lower environment is working perfectly fine. Can anyone please help me identify the issue?

SS for reference:

Thanks,
Yash Ojha

yashojha · ‎02-19-2026

I am using DBR 16.4LTS

Thanks,
Yash Ojha

saurabh18cs · ‎02-19-2026

hi why are you writing to intermediary first and not directly to your external data lake storage which is also blob storage.? with little bit of context i can say thisIs this for a single file copy or multi-files?

From a parallelization perspective, you can use spark with a UDF. Create a dataframe with file paths as rows then run a UDF that will run shutil copy function for each path ( dbutils will not work within a UDF). That way the whole cluster will be used to parallelize file transfer (distributing the cpu, disk and network bandwidth usage).

For single threaded driver side operation either shutil or dbutils can work. You can also do driver side multi-threading with asyncio, but you will be bounded only by the driver node capacity (+ network capacity).

@yashojha

SteveOstrowski · ‎03-07-2026

Hi @yashojha,

Thanks for the detailed writeup. A 22-minute write for a small file to a managed volume is definitely not expected behavior, especially when it works quickly in your lower environment. Let me walk through what is likely happening and how to troubleshoot and resolve it.

UNDERSTANDING THE BOTTLENECK

Unity Catalog managed volumes use a FUSE (Filesystem in Userspace) layer on the driver node that translates standard file system calls (open, write, close) into cloud object storage API calls under the hood. When you write a file to /Volumes/catalog/schema/volume/path, each write operation goes through this FUSE translation layer to the underlying cloud storage (Azure Blob/ADLS, S3, or GCS depending on your cloud provider).

The FUSE layer adds some overhead compared to writing to local disk, but 22 minutes for a small file indicates something beyond normal FUSE overhead. Since it works fine in your lower environment, the difference is almost certainly in the infrastructure or network configuration between the two environments.

INVESTIGATION STEPS

1. CHECK NETWORK AND STORAGE CONFIGURATION

This is the most likely cause. Compare these between your lower environment and the problematic one:

- Is the production workspace using a VNet/VPC with restrictive firewall rules or a private endpoint for storage?
- Is the managed storage account behind a firewall or private link that introduces routing latency?
- Are there NSG (Network Security Group) rules or route tables that force storage traffic through a firewall appliance or inspection layer?
- Is DNS resolution for the storage endpoint going through a custom DNS that may be slow?

To check where your managed volume data is stored, run:

DESCRIBE SCHEMA EXTENDED your_catalog.your_schema;

Look at the "Managed Location" in the output. Then verify that the cluster has efficient, direct network connectivity to that storage account.

2. IDENTIFY THE WRITE METHOD

How you write the file matters significantly. If your code writes the decrypted file using Python open() with many small write calls, each write may become an individual API call through FUSE. This is much slower than a single bulk write.

For example, this pattern is slow:

with open("/Volumes/catalog/schema/vol/file.dat", "wb") as f:
for chunk in decrypt_stream(encrypted_data):
f.write(chunk) # Each small write goes through FUSE

3. CHECK DRIVER NODE RESOURCES

If the driver node is undersized or under memory pressure, FUSE operations can slow down. Check the Spark UI metrics tab during the write to see if the driver is resource-constrained.

4. RUN A DIAGNOSTIC TEST

Run this quick test to isolate whether the issue is FUSE/storage or your decryption code:

import time

# Test 1: Write a simple test file to the volume
test_data = b"x" * (10 * 1024 * 1024) # 10 MB of test data

start = time.time()
with open("/Volumes/your_catalog/your_schema/your_volume/test_file.bin", "wb") as f:
f.write(test_data)
elapsed_volume = time.time() - start
print(f"Volume write: {elapsed_volume:.2f} seconds")

# Test 2: Write to local ephemeral disk for comparison
start = time.time()
with open("/local_disk0/test_file.bin", "wb") as f:
f.write(test_data)
elapsed_local = time.time() - start
print(f"Local disk write: {elapsed_local:.2f} seconds")

# Test 3: Copy from local disk to volume using dbutils
start = time.time()
dbutils.fs.cp("file:/local_disk0/test_file.bin", "/Volumes/your_catalog/your_schema/your_volume/test_file2.bin")
elapsed_copy = time.time() - start
print(f"dbutils.fs.cp to volume: {elapsed_copy:.2f} seconds")

If Test 1 is slow but Test 3 is fast, the issue is with how the FUSE layer handles the write pattern. If both are slow, the issue is network/storage connectivity.

Databricks Community

Slow writes to managed volume

🌟 Community Pulse: Your Weekly Roundup! June 22 – 28, 2026

Solution Accelerator Series | Product Quality Inspection

Upcoming Community BrickTalk: Bringing (Geo)Spatial Awareness to your Conversational Agents

Databricks Community Champion - June 2026 - Amira Bedhiafi

Build apps without jumping through hoops