Hello,
I have a weird problem in databricks for which I hope you can suggest some solutions.
I have an azureml blob storage mounted to databricks with some folder structure that can be accessed from a notebook as
/dbfs/mnt/azuremount/foo/bar/something.txt
Usually there could be a very large number of subfolders in some of the folders, for example foo could contain more than 1000 subfolders.
My problem is the following:
If I start a new compute cluster I can access something.txt from any notebook. If I do
ls /dbfs/mnt/azuremount/foo/bar/something.txt | head -n 2
in some notebook then databricks throws an error that there is no file uner the path "/dbfs/mnt/azuremount/foo/bar/something.txt".
But if call the following commands sequentially:
ls /dbfs/mnt/azuremount/ | head -n 2
ls /dbfs/mnt/azuremount/foo/ | head -n 2
ls /dbfs/mnt/azuremount/foo/bar/ | head -n 2
then I can magically open something.txt from any notebook!
I suspect that there must be some caching issue in the background but when I tried dbutils.fs.refreshMount() it didn't fix my issue. I want to find some solutions to this because it is incredibly annoying if I have to open multiple folders by hand all the time once I start my compute cluster.
Thanks,
Ben