- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-23-2025 12:49 PM
Hi everyone,
I'm currently working with the unstructured data pipeline in Databricks, using the official notebook provided by Databricks without any modifications. Strangely, despite being an out-of-the-box resource, the notebook fails during execution with the following error:
PythonException:
An exception was thrown from the Python worker. Please see the stack trace below.
Traceback (most recent call last):
File <command-1127042695011754>, line 240, in _recursive_character_text_splitter
File <command-1127042695011754>, line 62, in <lambda>
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-30d95ded-138f-42e0-83c5-245d1d30255a/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 817, in from_pretrained
tokenizer_config = get_tokenizer_config(pretrained_model_name_or_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-30d95ded-138f-42e0-83c5-245d1d30255a/lib/python3.12/site-packages/transformers/models/auto/tokenization_auto.py", line 649, in get_tokenizer_config
resolved_config_file = cached_file(
^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-30d95ded-138f-42e0-83c5-245d1d30255a/lib/python3.12/site-packages/transformers/utils/hub.py", line 462, in cached_file
except HFValidationError as e:
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-30d95ded-138f-42e0-83c5-245d1d30255a/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-30d95ded-138f-42e0-83c5-245d1d30255a/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1010, in hf_hub_download
return _hf_hub_download_to_cache_dir(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-30d95ded-138f-42e0-83c5-245d1d30255a/lib/python3.12/site-packages/huggingface_hub/file_download.py", line 1127, in _hf_hub_download_to_cache_dir
os.makedirs(os.path.dirname(blob_path), exist_ok=True)
File "<frozen os>", line 216, in makedirs
File "<frozen os>", line 216, in makedirs
File "<frozen os>", line 216, in makedirs
File "<frozen os>", line 230, in makedirs
OSError: [Errno 30] Read-only file system: '/local_disk0/tmp'
Write not supported
Files in Workspace are read-only from executors. Please consider using Volumes if you need to persist data written from executors.
The error seems to come from the Hugging Face transformers library trying to download or cache a tokenizer model, but it fails because the executor environment doesn't allow writing to /local_disk0/tmp.
What’s puzzling is that this notebook is supposed to be plug-and-play. Has anyone else encountered this issue? Are there known workarounds or fixes—perhaps involving Volumes or changing the cache directory?
Any help or insight would be greatly appreciated!
Thanks,
Mariano
- Labels:
-
GenAI and LLMs
-
GenAIGeneration AI
-
RAG