cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks cluster starts with docker

nachog99
New Contributor II

Hi there!

I hope u are doing well

I'm trying to start a cluster with a docker image to install all the libraries that I have to use.

I have the following Dockerfile to install only python libraries as you can see

FROM databricksruntime/standard
WORKDIR /app
COPY . .
RUN apt-get update && apt-get install -y python3-pip
RUN sudo apt-get install -y libpq-dev
RUN pip install -r /app/requirements.txt
CMD ["python3"]

Does anybody knows how to install maven libraries from this same Dockerfile? I've tried and looked up for many solutions but I can't figure it out how to do that.

The last thing I've had tried is to use a Multi stage building using the Maven image but I had trouble with the dependencies (missing POM.xml file).

# MAVEN + PYTHON
 
FROM databricksruntime/standard
WORKDIR /app
COPY . .
RUN apt-get update && apt-get install -y python3-pip
RUN sudo apt-get install -y libpq-dev
RUN pip install -r /app/requirements.txt
CMD ["python3"]
 
FROM maven:latest
WORKDIR /root
COPY --from=0 /app .
 
RUN mvn clean install org.apache.maven.plugins:maven-dependency-plugin:2.1:get \
    -DrepoUrl=https://mvnrepository.com/artifact/com.crealytics/spark-excel_2.12/0.14.0 \
    -Dartifact=com.crealytics:spark-excel_2.12:0.14.0
 
RUN mvn clean install org.apache.maven.plugins:maven-dependency-plugin:2.1:get \
    -DrepoUrl=https://mvnrepository.com/artifact/mysql/mysql-connector-java \
    -Dartifact=mysql:mysql-connector-java:8.0.29

image.png 

I don't get it how to install maven libraries from Dockerfile

If someone has knowledge about something like this and could help me I will appreciate it a lot.

Thanks!

4 REPLIES 4

Thanks! I have tried some answers from the S.O discussion and I could build the image but I can't run it.

I can build the image using the flag dependency:solve

But still can't install it, when I have to run it I receive the next message, so the cluster can't start

imageAnyways, I'm grateful about your answer because it was very useful to keep learning about how to resolve this issue, I appreciate that

axb0
New Contributor III

1) Install your jars in a new layer, not in the same layer

2) installing with maven is more work than building the library in your jar layer

Vidula
Honored Contributor

Hi @Ignacio Guillamondegui​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

xneg
Contributor

Hi! I am facing a similar issue.

I tried to use this one

FROM databricksruntime/standard:10.4-LTS
 
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y maven && rm -rf /var/lib/apt/lists/*
 
RUN /databricks/python3/bin/pip install databricks-cli
 
RUN mkdir /databricks/jars
 
RUN mvn org.apache.maven.plugins:maven-dependency-plugin:2.8:get -Dartifact=com.microsoft.azure.kusto:kusto-spark_3.0_2.12:2.5.2 -Ddest=/databricks/jars/
 
RUN /databricks/python3/bin/pip install azure-kusto-data==2.1.1

But it looks like it doesn't work. I get an error java.lang.NoClassDefFoundError: com/microsoft/azure/kusto/data/exceptions/DataServiceException

And if I install libraries using an interface like on the picture - everything works.Screenshot 2023-03-30 at 10.49.33

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group