How can the shared memory size (/dev/shm) be increased on databricks worker nodes with custom docker images?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-28-2021 02:59 AM
PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to use to share image tensor batches. This means that when using a custom docker image on a databricks cluster it is not possible to use PyTorch with multiple dataloaders. We can fix this by setting the `--shm-size` or `--ipc=host` args on `docker run` - how can this be set on a databricks cluster?
Note that this doesn't affect the default databricks runtime it looks like that is using the linux default of making half the physical RAM available to /dev/shm - 6.9GB on the Standard_DS3_v2 node I tested.
To reproduce: start a cluster using a custom docker image, run `df -h /dev/shm` in a notebook.
Thanks in advance!
- Labels:
-
Deep learning
-
Memory Size
-
Pytorch