Hi,
I have some workloads whereby i need to export an xarray object to a Zarr store.
My UC volume is using ADLS.
I tried to run a simple benchmark and found that UC Volume is considerably slower.
a) Using a fsspec ADLS store pointing to the same container behind UC Volume. Result : 42 s.
b) Treat UC Volume as a LocalStore. Result : 93 s.
Does UC Volume support async I/O ? I am suspecting that this could be the reason behind the slower performance ?
import xarray as xr
import adlfs
import zarr
from zarr.storage import FsspecStore
fs = adlfs.AzureBlobFileSystem(account_name=ABS_ACCOUNT_NAME, credential=SILVER_SAS_TOKEN, asynchronous=True)
files = glob('./samples/N1S*01')
args_cubed = {'engine': 'cfgrib',
'filter_by_keys': {
'dataType': 'fc',
'typeOfLevel': ['surface', 'isobaricInhPa']
},
'chunks': {}
}
def preprocess(ds):
return ds.expand_dims(['time', 'step'])
ds = xr.open_mfdataset(
files,
preprocess=preprocess,
parallel=True,
**args_cubed
)
ds2 = ds.load()
store_azb = FsspecStore(fs, path='silver/nwp/azb_benchmark_v3.zarr')
store_uc = zarr.storage.LocalStore('/Volumes/mss-uc/silver/silver-volume/nwp/unity_catalog_benchmark_v3.zarr')
