cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Why is writing direct to Unity Catalog Volume slower than to Azure Blob Storage (xarray -> zarr)

songhan89
New Contributor

Hi,

I have some workloads whereby i need to export an xarray object to a Zarr store.

My UC volume is using ADLS.

I tried to run a simple benchmark and found that UC Volume is considerably slower.

a) Using a fsspec ADLS store pointing to the same container behind UC Volume. Result : 42 s.

b) Treat UC Volume as a LocalStore. Result : 93 s.

Does UC Volume support async I/O ? I am suspecting that this could be the reason behind the slower performance ?

 

 

 

import xarray as xr
import adlfs
import zarr
from zarr.storage import FsspecStore

fs = adlfs.AzureBlobFileSystem(account_name=ABS_ACCOUNT_NAME, credential=SILVER_SAS_TOKEN, asynchronous=True)

files = glob('./samples/N1S*01')

args_cubed = {'engine': 'cfgrib',
    'filter_by_keys': {
        'dataType': 'fc',
        'typeOfLevel': ['surface', 'isobaricInhPa']
        },
    'chunks': {}
 }

def preprocess(ds):
    return ds.expand_dims(['time', 'step'])

ds = xr.open_mfdataset(
    files,
    preprocess=preprocess,
    parallel=True,
    **args_cubed
)

ds2 = ds.load()

store_azb = FsspecStore(fs, path='silver/nwp/azb_benchmark_v3.zarr')
store_uc = zarr.storage.LocalStore('/Volumes/mss-uc/silver/silver-volume/nwp/unity_catalog_benchmark_v3.zarr')

 

 

 

songhan89_0-1738517230323.png

 



0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now