06-20-2022 01:51 AM
Hi Team,
We are wondering if there is a recommended way to install the chromium browser and chrome driver on Databricks Runtime 10.4 and above ?
I have been through the site and have come across several links to this effect, but they all seem to be installing the Chromium Browser from the official Canonical PPA (ppa:canonical-chromium-builds/stage)
FYI - Please use the contents in link at your own risk -
(https://community.databricks.com/s/question/0D53f00001qEWtxCAG/chromedriver-installation-in-databricks)
From my understanding, the DBX runtime 10.4 is based on Ubuntu 20.04 - focal for which builds do not seem to be available in that PPA. Maybe because Ubuntu moved from official deb to installing Chromium as a snap ?
So questions are as follows
08-25-2022 07:47 AM
Hi @Bharath Kumar Ramachandran @Vidula Khanna , I tried all possible ways to install chromium but at one or the other place it is failing on something. Going back and troubleshooting it is really a time-consuming task. To make it simple, you can try using the docker container services. Install chromium at the container level and use the image.
08-29-2022 11:48 PM
Thank you so much for your response and apologies for all the trouble.
We had considered the custom runtime options suggested, but it got turned down because the effort to create and maintain the custom runtime was not something we could do at this moment. Also maintenance fixes that flow down time to time are also something to consider. For now, we have planned to either pause or move the requirement to a different/existing option.
Thank you once again for all the help.
11-09-2022 06:27 AM
Hi, @Bharath Kumar Ramachandran I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL
11-15-2022 02:10 AM
Thank you @Hubert Dudek for your script.
If possible, suggest including a reference to the below link and description just in case Google changes some of the steps so that in the future if the process changes, users would still be able to figure it out.
https://www.chromium.org/getting-involved/download-chromium/
11-10-2022 05:45 PM
Has anyone figured out how to get Selenium to work on Databricks seamlessly?
Beginning to question whether it's a common occurrence for the Databricks support team to not get back to disgruntled users on time...
This request has been opened for some months with no satisfactory response.
Does anyone know any online resource that could guide us to this pertinent issue? Any response at this stage would be appreciated.
06-27-2023 04:19 AM
I had to install a Chromium based browser on databricks as it was required by the R package Webshot2.
After trying a lot of things, I was able to install google-chrome using this script.
Our cluster is running DBX Runtime 13.1 which is based on Ubuntu 22.04.2 LTS
%sh
#Add the package signing key for google and add repository to sourcelist.d
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
sudo apt-get update
#You might have to do this if you run into issue with broken repositoy while pulling dependencies for your release.
wget http://ge.archive.ubuntu.com/ubuntu/pool/main/m/mesa/libgbm1_22.0.1-1ubuntu2_amd64.deb
sudo apt install -y ./libgbm1_22.0.1-1ubuntu2_amd64.deb
#Finally install chrome. And since the R function invoking didn't have option to pass args, I had to add the --no-sandbox directly to the launcher script.
sudo apt install -y google-chrome-stable
sudo sed -i '$ s/$/'" --no-sandbox"'/' /opt/google/chrome/google-chrome
06-27-2023 06:02 AM
If you decide to go with option 3 to use google-chrome you could use this cluster init script.
#!/bin/bash
export CHROME_DRIVER_URL=https://chromedriver.storage.googleapis.com
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add -
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
sudo apt-get update
wget http://ge.archive.ubuntu.com/ubuntu/pool/main/m/mesa/libgbm1_22.0.1-1ubuntu2_amd64.deb
sudo apt install -y ./libgbm1_22.0.1-1ubuntu2_amd64.deb
sudo apt install -y google-chrome-stable
sudo sed -i '$ s/$/'" --no-sandbox"'/' /opt/google/chrome/google-chrome
wget -q -O - ${CHROME_DRIVER_URL}/LATEST_RELEASE_$(google-chrome --version | cut -d' ' -f3 | cut -d'.' -f1) | wget ${CHROME_DRIVER_URL}/$(cat -)/chromedriver_linux64.zip
sudo unzip chromedriver_linux64.zip -d /opt/chromedriver
sudo ln -s /opt/chromedriver/chromedriver /bin/chromedriver
02-13-2024 09:44 AM - edited 02-13-2024 09:46 AM
Look into Playwrite instead of Selenium. I went through the same process y'all went through here (ended up writing a init script to install the drivers etc)
This is all done for you in playwright. Refer to this post - I hope it helps!!
https://community.databricks.com/t5/community-discussions/using-python-rpa-library-on-databricks/td-...
Oh - and you also no longer have to manage your driver versions 😉 thats a plus!
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group