I'm building my own Docker images to use for a cluster. The problem is that the only image I seem to be able to run is the official base image "databricksruntime/python:13.3-LTS". If I install a pip package, I get the following on standard error:
/databricks/spark/scripts/setup_driver_env.sh: line 43: R: command not found
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
ANTLR Tool version 4.8 used for code generation does not match the current runtime version 4.9.3
Thu Mar 21 08:01:25 2024 Connection to spark from PID 1270
Thu Mar 21 08:01:25 2024 Initialized gateway on port 44991
Thu Mar 21 08:01:26 2024 Connected to spark.
Traceback (most recent call last):
File "/databricks/python_shell/scripts/db_ipykernel_launcher.py", line 86, in <module>
app.shell.run_line_magic("matplotlib", "inline")
File "/databricks/python/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 2480, in run_line_magic
result = fn(*args, **kwargs)
File "/databricks/python/lib/python3.10/site-packages/IPython/core/magics/pylab.py", line 99, in matplotlib
gui, backend = self.shell.enable_matplotlib(args.gui.lower() if isinstance(args.gui, str) else args.gui)
File "/databricks/python/lib/python3.10/site-packages/IPython/core/interactiveshell.py", line 3658, in enable_matplotlib
gui, backend = pt.find_gui_and_backend(gui, self.pylab_gui_select)
File "/databricks/python/lib/python3.10/site-packages/IPython/core/pylabtools.py", line 320, in find_gui_and_backend
import matplotlib
File "/databricks/python/lib/python3.10/site-packages/matplotlib/__init__.py", line 161, in <module>
from . import _api, _version, cbook, _docstring, rcsetup
File "/databricks/python/lib/python3.10/site-packages/matplotlib/cbook/__init__.py", line 32, in <module>
from matplotlib._api.deprecation import (
ImportError: cannot import name 'mplDeprecation' from 'matplotlib._api.deprecation' (/databricks/python/lib/python3.10/site-packages/matplotlib/_api/deprecation.py)
This is my pretty minimal Dockerfile, and without the last line it works fine.
# syntax = docker/dockerfile:1.2
FROM databricksruntime/python:13.3-LTS
COPY my_code /my_code
RUN --mount=type=secret,id=pip_conf,dst=/root/.pip/pip.conf /databricks/python3/bin/pip install --ignore-installed my-package[extras]==1.2.3