cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Community Platform Discussions
Connect with fellow community members to discuss general topics related to the Databricks platform, industry trends, and best practices. Share experiences, ask questions, and foster collaboration within the community.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Issue with VSCode Extension and Databricks Cluster Using Docker Image

danmlopsmaz
New Contributor II

I've encountered a significant issue while using the VSCode extension for Databricks, particularly when working with a cluster configured with a Docker image. Here's a detailed description of the problem:

Problem Description

When attempting to upload and execute a Python file with VSCode to a Databricks cluster that utilizes a custom Docker image, the connection fails, and the extension does not function as expected.

danmlopsmaz_0-1721482196211.png

 

==============================
Errors in 00-databricks-init-3331c3ed293013bfec5837e683d00cfe.py:
 
WARNING -  All log messages before absl: :InitializeLog() is called are written to STDERR
I0000 00 - 00: 1721481367.546267  105941 config.cc:230] gRPC experiments enabled: call_status_override_on_cancellation, event_engine_dns, event_engine_listener, http2_stats_fix, monitoring_experiment, pick_first_new, trace_record_callops, work_serializer_clears_time_cache
Error: CommandExecution.createAndWait: failed to reach Running state, got Error: [object Object]

 

danmlopsmaz_1-1721482508346.png

 

7/20/2024, 8:34:11 AM - Creating execution context on cluster 0719 ...
Error: CommandExecution.createAndWait: failed to reach Running state, got Error: [object Object]
Execution terminated

 

Potential Workarounds

  • Databricks connect: Run the databricks connect in a terminal works to execute the spark code in the cluster. But, the VS Code extension does not.

Note

It is important to mention that when I run the same Python file with a standard cluster with no docker on it, the VSCode extension works as expected.

1 REPLY 1

danmlopsmaz
New Contributor II

Hi @Retired_mod thanks for a such quick response.

Actually, I am using the Dockerfile from the Databricks runtime example here: https://github.com/databricks/containers/blob/master/ubuntu/minimal/Dockerfile . The configuration with the VSCode extensions is fine since I already mentioned that the "upload and run python file" command works with a standard cluster.

This is my Dockerfile:

# This Dockerfile creates a clean Databricks runtime 12.2 LTS without any library ready to deploy to Databricks
FROM databricksruntime/minimal:14.3-LTS
# These are the versions compatible for DBR 12.x

ARG python_version="3.9"
ARG pip_version="22.3.1"
ARG setuptools_version="65.6.3"
ARG wheel_version="0.38.4"
ARG virtualenv_version="20.16.7"

# Set the debconf frontend to Noninteractive
RUN echo 'debconf debconf/frontend select Noninteractive' | debconf-set-selections

# Installs python 3.x and virtualenv for Spark and Notebooks
RUN sudo apt-get update && sudo apt-get install dialog apt-utils curl build-essential fuse openssh-server software-properties-common --yes \
    && sudo add-apt-repository ppa:deadsnakes/ppa -y && sudo apt-get update \
    && sudo apt-get install python${python_version} python${python_version}-dev python${python_version}-distutils --yes \
    && curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py \
    && /usr/bin/python${python_version} get-pip.py pip>=${pip_version} setuptools>=${setuptools_version} wheel>=${wheel_version} \
    && rm get-pip.py \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

RUN /usr/local/bin/pip${python_version} install --no-cache-dir virtualenv==${virtualenv_version} \
    && sed -i -r 's/^(PERIODIC_UPDATE_ON_BY_DEFAULT) = True$/\1 = False/' /usr/local/lib/python${python_version}/dist-packages/virtualenv/seed/embed/base_embed.py \
    && /usr/local/bin/pip${python_version} download pip==${pip_version} --dest \
    /usr/local/lib/python${python_version}/dist-packages/virtualenv_support/

# Initialize the default environment that Spark and notebooks will use
RUN virtualenv --python=python${python_version} --system-site-packages /databricks/python3 --no-download  --no-setuptools

# These python libraries are used by Databricks notebooks and the Python REPL
# You do not need to install pyspark - it is injected when the cluster is launched
# Versions are intended to reflect latest DBR: https://docs.databricks.com/release-notes/runtime/11.1.html#system-environment
RUN /databricks/python3/bin/pip install \
    six>=1.16.0 \
    jedi>=0.18.1 \
    # ensure minimum ipython version for Python autocomplete with jedi 0.17.x
    ipython>=8.10.0 \
    pyarrow>=8.0.0 \
    ipykernel>=6.17.1 \
    grpcio>=1.48.1 \
    grpcio-status>=1.48.1 \
    databricks-sdk>=0.1.6

# Specifies where Spark will look for the python process
ENV PYSPARK_PYTHON=/databricks/python3/bin/python3
# Specifies Tracking URI for MLflow Integration
ENV MLFLOW_TRACKING_URI='databricks'
# Make sure the USER env variable is set. The files exposed
# by dbfs-fuse will be owned by this user.
# Within the container, the USER is always root.
ENV USER root

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group