Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I am encountering an issue while attempting to create a data profile on clusters using Docker Container Service (version 10.4 LTS). I keep receiving the following exception:java.nio.charset.MalformedInputException: Input length = 1What's puzzling is ...
Hi @Adrianna Klank,We haven't heard from you since the last response from @Akash Bhat, and I was checking back to see if the suggestion helped you.Or else, If you have any solution, please share it with the community, as it can be helpful to other...
Hi @Kevin Kim Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we ...
Hi there!I hope u are doing wellI'm trying to start a cluster with a docker image to install all the libraries that I have to use.I have the following Dockerfile to install only python libraries as you can seeFROM databricksruntime/standard
WORKDIR /...
Hi! I am facing a similar issue.I tried to use this oneFROM databricksruntime/standard:10.4-LTS
ENV DEBIAN_FRONTEND=noninteractive
RUN apt update && apt install -y maven && rm -rf /var/lib/apt/lists/*
RUN /databricks/python3/bin/pip install datab...
When using your own docker container while creating a databricks cluster, what is the mapping between the number of containers launched and the nodes launched?Is it 1:1 mapping? or is it similar to other orchestration framework like Kubernetes?Or is ...
Hey everyone, I'm experiementing with running containerized pyspark jobs in Databricks, and orchestrating them with airflow. I am however, encountering an issue here. When I trigger an airflow DAG, and I look at the logs, I see that airflow is spinni...
Both, I guess? Yes, all jobs share the same config - the question I have is why in the same airflow task log, there are 3 jobs runs. I'm hoping that there's something in the configs and may give me some kind of clue.
Quite soon after moving from VMs to containers, I started crafting my own images. That way notebooks have all the necessary libraries already there and no need to do any Pipping/installing in the notebook.As requirements get more complex, now I'm at ...
Hi @Jari Turkia , Please check if this helps: https://developers.redhat.com/blog/2019/04/24/how-to-run-systemd-in-a-container#other_cool_features_about_podman_and_systemdAlso, you can run ubuntu /redhat linux OS inside containers which will have sys...
Hi, We are currently using a Azure AAD Token inorder to authenticate with Databricks instead of generating Personal Access Tokens from Databricks. We have a multi-tenant architecture and so we are using Azure container instances to run multiple trans...
Hi,
I'd like to ask you, how much resources do you plan to dedicate to the maintenance/development of the official Databricks Docker images, please? Do you have a view on the longer-term plan for these docker images?
It seems to be maintained, but i...
When running Docker for a long time, there are a lot of images in the system. How can I remove all unused Docker images at once safely to free up the storage?In addition, I also want to remove images pulled months ago, So, I'm not asking for removing...
Hey there @william smith Hope everything is going great!Does @Prabakar Ammeappin's response answer your question? If yes, would you be happy to mark it as best so that other members can find the solution more quickly? Else please let us know if you...
Is there a working setup on setting up metrics export to CloudWatch while using custom docker images for cluster creation? I've tried to set up the CloudWatch agent manually, but launching `amzon-cloudwatch-agent-ctl` in the bootstrap script fails wi...
We do not support gangila with custom docker too. but let me cross verify if we are supporting cloudwatch for the same. Sorry for the inconvenience @Sergey Ivanychev
Is it possible to deploy the mlflow model to a sagemaker endpoint where the image URL is not referring to an image in ECR but the image is actually present in a private docker registry?
@Saurabh Verma , this to create the endpoint.also, check this out - https://github.com/mlflow/mlflow/blob/0fa849ad75e5733bf76cc14a4455657c5c32f107/mlflow/sagemaker/__init__.py#L361
Hello,are databricks runtimes from docker hub ( https://hub.docker.com/r/databricksruntime/standard ) same as actual runtimes inside Databricks? I mean when we made our own docker image from databricksruntime/standard will be there same dependencies...
Hello, How are you? I hope you are doing well!I´m trying to use a databrick´s image (link: containers/ubuntu/R at master · databricks/containers (github.com)) to run a container when starting a cluster. I need that Rstudio is installed on the contain...
Specifically, we have in mind:* Create a Databricks job for testing API changes (the API library is built in a custom Jar file)* When we want to test an API change, build a Docker image with the relevant changes in a Jar file* Update the job configur...
>Where do we put custom Jar files when building the Docker image? /databricks/jars>How do we update the job configuration so that the job’s cluster will be built with this new Docker image, and how long do we expect this re-configuring process to tak...