Issue with VSCode Extension and Databricks Cluster Using Docker Image

danmlopsmaz — Sat, 20 Jul 2024 13:51:43 GMT

I've encountered a significant issue while using the VSCode extension for Databricks, particularly when working with a cluster configured with a Docker image. Here's a detailed description of the problem:

Problem Description

When attempting to upload and execute a Python file with VSCode to a Databricks cluster that utilizes a custom Docker image, the connection fails, and the extension does not function as expected.

============================== Errors in 00-databricks-init-3331c3ed293013bfec5837e683d00cfe.py: WARNING - All log messages before absl: :InitializeLog() is called are written to STDERR I0000 00 - 00: 1721481367.546267 105941 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache

Error: CommandExecution.createAndWait: failed to reach Running state, got Error: [object Object]

7/20/2024, 8:34:11 AM - Creating execution context on cluster 0719 ... Error: CommandExecution.createAndWait: failed to reach Running state, got Error: [object Object] Execution terminated

Potential Workarounds

Databricks connect: Run the databricks connect in a terminal works to execute the spark code in the cluster. But, the VS Code extension does not.

Note

It is important to mention that when I run the same Python file with a standard cluster with no docker on it, the VSCode extension works as expected.

Re: Issue with VSCode Extension and Databricks Cluster Using Docker Image

danmlopsmaz — Mon, 22 Jul 2024 20:41:42 GMT

Hi @Retired_mod thanks for a such quick response.

Actually, I am using the Dockerfile from the Databricks runtime example here: https://github.com/databricks/containers/blob/master/ubuntu/minimal/Dockerfile . The configuration with the VSCode extensions is fine since I already mentioned that the "upload and run python file" command works with a standard cluster.

This is my Dockerfile:

# This Dockerfile creates a clean Databricks runtime 12.2 LTS without any library ready to deploy to Databricks FROM databricksruntime/minimal:14.3-LTS # These are the versions compatible for DBR 12.x ARG python_version="3.9" ARG pip_version="22.3.1" ARG setuptools_version="65.6.3" ARG wheel_version="0.38.4" ARG virtualenv_version="20.16.7" # Set the debconf frontend to Noninteractive RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections # Installs python 3.x and virtualenv for Spark and Notebooks RUN sudo apt-get update && sudo apt-get install dialog apt-utils curl build-essential fuse openssh-server software-properties-common --yes \ && sudo add-apt-repository ppa:deadsnakes/ppa -y && sudo apt-get update \ && sudo apt-get install python${python_version} python${python_version}-dev python${python_version}-distutils --yes \ && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \ && /usr/bin/python${python_version} get-pip.py pip>=${pip_version} setuptools>=${setuptools_version} wheel>=${wheel_version} \ && rm get-pip.py \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* RUN /usr/local/bin/pip${python_version} install --no-cache-dir virtualenv==${virtualenv_version} \ && sed -i -r 's/^(PERIODIC_UPDATE_ON_BY_DEFAULT) = True$/\1 = False/' /usr/local/lib/python${python_version}/dist-packages/virtualenv/seed/embed/base_embed.py \ && /usr/local/bin/pip${python_version} download pip==${pip_version} --dest \ /usr/local/lib/python${python_version}/dist-packages/virtualenv_support/ # Initialize the default environment that Spark and notebooks will use RUN virtualenv --python=python${python_version} --system-site-packages /databricks/python3 --no-download --no-setuptools # These python libraries are used by Databricks notebooks and the Python REPL # You do not need to install pyspark - it is injected when the cluster is launched # Versions are intended to reflect latest DBR: https://docs.databricks.com/release-notes/runtime/11.1.html#system-environment RUN /databricks/python3/bin/pip install \ six>=1.16.0 \ jedi>=0.18.1 \ # ensure minimum ipython version for Python autocomplete with jedi 0.17.x ipython>=8.10.0 \ pyarrow>=8.0.0 \ ipykernel>=6.17.1 \ grpcio>=1.48.1 \ grpcio-status>=1.48.1 \ databricks-sdk>=0.1.6 # Specifies where Spark will look for the python process ENV PYSPARK_PYTHON=/databricks/python3/bin/python3 # Specifies Tracking URI for MLflow Integration ENV MLFLOW_TRACKING_URI='databricks' # Make sure the USER env variable is set. The files exposed # by dbfs-fuse will be owned by this user. # Within the container, the USER is always root. ENV USER root

topic Re: Issue with VSCode Extension and Databricks Cluster Using Docker Image in Get Started Discussions

Issue with VSCode Extension and Databricks Cluster Using Docker Image

Problem Description

Potential Workarounds

Note

Re: Issue with VSCode Extension and Databricks Cluster Using Docker Image