Databricks Community

BenceCzako · ‎12-11-2024

Hello,

I have a weird problem in databricks for which I hope you can suggest some solutions.
I have an azureml blob storage mounted to databricks with some folder structure that can be accessed from a notebook as
/dbfs/mnt/azuremount/foo/bar/something.txt
Usually there could be a very large number of subfolders in some of the folders, for example foo could contain more than 1000 subfolders.

My problem is the following:
If I start a new compute cluster I can access something.txt from any notebook. If I do
ls /dbfs/mnt/azuremount/foo/bar/something.txt | head -n 2
in some notebook then databricks throws an error that there is no file uner the path "/dbfs/mnt/azuremount/foo/bar/something.txt".
But if call the following commands sequentially:
ls /dbfs/mnt/azuremount/ | head -n 2
ls /dbfs/mnt/azuremount/foo/ | head -n 2
ls /dbfs/mnt/azuremount/foo/bar/ | head -n 2
then I can magically open something.txt from any notebook!

I suspect that there must be some caching issue in the background but when I tried dbutils.fs.refreshMount() it didn't fix my issue. I want to find some solutions to this because it is incredibly annoying if I have to open multiple folders by hand all the time once I start my compute cluster.

Thanks,
Ben

Alberto_Umana · ‎12-11-2024

Hi @BenceCzako,

Have you tried by "Detach and re-attaching" the compute on the notebook? And what DBR version are you using?

BenceCzako · ‎12-12-2024

Hi,

That is actually my problem, sorry if I didn't explain properly.

So every time I attach/reattach/restart a cluster, I can't find the files under a given mount path only if I start to list things from the mount entrypoint using 'ls'. So this means that every time I start up my cluster in the beginning of the day, I can't reach my files unless I do what I mentioned in my original post.

I am using DBS 14.3. LTS.