PyTorch uses shared memory to efficiently share tensors between its dataloader workers and its main process. However in a docker container the default size of the shared memory (a tmpfs file system mounted at /dev/shm) is 64MB, which is too small to use to share image tensor batches. This means that when using a custom docker image on a databricks cluster it is not possible to use PyTorch with multiple dataloaders. We can fix this by setting the `--shm-size` or `--ipc=host` args on `docker run` - how can this be set on a databricks cluster?
Note that this doesn't affect the default databricks runtime it looks like that is using the linux default of making half the physical RAM available to /dev/shm - 6.9GB on the Standard_DS3_v2 node I tested.
To reproduce: start a cluster using a custom docker image, run `df -h /dev/shm` in a notebook.
Thanks in advance!