cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to install Chromium Browser and Chrome Driver on DBX runtime 10.4 and above ?

ranged_coop
Valued Contributor II

Hi Team,

We are wondering if there is a recommended way to install the chromium browser and chrome driver on Databricks Runtime 10.4 and above ?

I have been through the site and have come across several links to this effect, but they all seem to be installing the Chromium Browser from the official Canonical PPA (ppa:canonical-chromium-builds/stage)

FYI - Please use the contents in link at your own risk -

(https://community.databricks.com/s/question/0D53f00001qEWtxCAG/chromedriver-installation-in-databricks)

From my understanding, the DBX runtime 10.4 is based on Ubuntu 20.04 - focal for which builds do not seem to be available in that PPA. Maybe because Ubuntu moved from official deb to installing Chromium as a snap ?

So questions are as follows

  1. Best way to install Chromium Browser in DBX Runtime 10.4 ?
  2. Should we consider the snap package which seems to be available ? (`apt search chromium-browser`) Will it work well with Selenium and the official Chrome Driver ?
  3. Should we consider any other sources for the Chromium Browser such as (`ppa:phd/chromium-browser`) or the google chrome browser directly ? Will they be safe ? Any license issues ?
  4. Does selenium support any other browsers ? Any other chromium based browsers or firefox with geckodriver ?
22 REPLIES 22

Prabakar
Databricks Employee
Databricks Employee

Hi @Bharath Kumar Ramachandran​ @Vidula Khanna​ , I tried all possible ways to install chromium but at one or the other place it is failing on something. Going back and troubleshooting it is really a time-consuming task. To make it simple, you can try using the docker container services. Install chromium at the container level and use the image.

https://hub.docker.com/layers/standard/databricksruntime/standard/10.4-LTS/images/sha256-caee5e0d586...

https://docs.databricks.com/clusters/custom-containers.html

ranged_coop
Valued Contributor II

Thank you so much for your response and apologies for all the trouble.

We had considered the custom runtime options suggested, but it got turned down because the effort to create and maintain the custom runtime was not something we could do at this moment. Also maintenance fixes that flow down time to time are also something to consider. For now, we have planned to either pause or move the requirement to a different/existing option.

Thank you once again for all the help.

Hubert-Dudek
Esteemed Contributor III

Hi, @Bharath Kumar Ramachandran​ ​ I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

Thank you @Hubert Dudek​  for your script.

If possible, suggest including a reference to the below link and description just in case Google changes some of the steps so that in the future if the process changes, users would still be able to figure it out.

https://www.chromium.org/getting-involved/download-chromium/

Not-as-easy steps:

  1. Head to https://commondatastorage.googleapis.com/chromium-browser-snapshots/index.html
  2. Choose your platform: Mac, Win, Linux, ChromiumOS
  3. Pick the Chromium build number you'd like to use
  4. The latest one is mentioned in the
  5. LAST_CHANGE
    1. file
  6. Download the zip file containing Chromium
  7. There is a binary executable within to run

swrd
New Contributor III

Has anyone figured out how to get Selenium to work on Databricks seamlessly?

Beginning to question whether it's a common occurrence for the Databricks support team to not get back to disgruntled users on time...

This request has been opened for some months with no satisfactory response.

Does anyone know any online resource that could guide us to this pertinent issue? Any response at this stage would be appreciated.

Anonymous
Not applicable

I had to install a Chromium based browser on databricks as it was required by the R package Webshot2.

After trying a lot of things, I was able to install google-chrome using this script.
Our cluster is running DBX Runtime 13.1 which is based on Ubuntu 22.04.2 LTS

%sh
#Add the package signing key for google and add repository to sourcelist.d
wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - 
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
sudo apt-get update

#You might have to do this if you run into issue with broken repositoy while pulling dependencies for your release.
wget http://ge.archive.ubuntu.com/ubuntu/pool/main/m/mesa/libgbm1_22.0.1-1ubuntu2_amd64.deb
sudo apt install -y ./libgbm1_22.0.1-1ubuntu2_amd64.deb

#Finally install chrome. And since the R function invoking didn't have option to pass args, I had to add the --no-sandbox directly to the launcher script.
sudo apt install -y google-chrome-stable
sudo sed -i '$ s/$/'" --no-sandbox"'/' /opt/google/chrome/google-chrome

 

Anonymous
Not applicable

If you decide to go with option 3 to use google-chrome you could use this cluster init script.

#!/bin/bash

export CHROME_DRIVER_URL=https://chromedriver.storage.googleapis.com

wget -q -O - https://dl-ssl.google.com/linux/linux_signing_key.pub | sudo apt-key add - 
sudo sh -c 'echo "deb [arch=amd64] http://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google.list'
sudo apt-get update


wget http://ge.archive.ubuntu.com/ubuntu/pool/main/m/mesa/libgbm1_22.0.1-1ubuntu2_amd64.deb
sudo apt install -y ./libgbm1_22.0.1-1ubuntu2_amd64.deb


sudo apt install -y google-chrome-stable
sudo sed -i '$ s/$/'" --no-sandbox"'/' /opt/google/chrome/google-chrome

wget -q -O - ${CHROME_DRIVER_URL}/LATEST_RELEASE_$(google-chrome --version | cut -d' ' -f3 | cut -d'.' -f1) | wget ${CHROME_DRIVER_URL}/$(cat -)/chromedriver_linux64.zip

sudo unzip chromedriver_linux64.zip -d /opt/chromedriver
sudo ln -s /opt/chromedriver/chromedriver /bin/chromedriver

 

 

Kaizen
Valued Contributor

Look into Playwrite instead of Selenium. I went through the same process y'all went through here (ended up writing a init script to install the drivers etc)

This is all done for you in playwright. Refer to this post - I hope it helps!!
https://community.databricks.com/t5/community-discussions/using-python-rpa-library-on-databricks/td-...

 

 

Oh - and you also no longer have to manage your driver versions 😉 thats a plus!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group