cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Errors Using Selenium/Chromedriver in DataBricks

Gray
Contributor

Hello,

I’m programming in a notebook and attempting to use the python library Selenium to automate Chrome/chromedriver. I’ve successfully managed to install selenium using

%sh
 pip install selenium

I then attempt the following code, which results in the WebdriverException, copied below.

from selenium import webdriver
driver = webdriver.Chrome()

Error:

WebdriverException: Message: ‘chromedriver’ executable needs to be in PATH. Please see https://chromedriver.chromium.org/home

After troubleshooting the error, I attempted instead to use webdriver-manager to install the instance of chromedriver as follows, whilst also running it headless.

%sh
pip install webdriver-manager
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
 
options = Options()
options.add_argument(“—headless”)
 
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)

This time, I got the following error:

WebdriverException: Message: Service /root/.wdm/drivers/chromedriver/linux64/107.0.5304/chromedriver unexpectedly exited. Status code was: 127

I’ve roamed the internet for a solution, but no matter what I try, my code ends up throwing one of the two WebDriverException errors above. 

Does anybody know how I can get selenium running on DataBricks in order to automate Chrome/chromedriver?

Thanks!

1 ACCEPTED SOLUTION

Accepted Solutions

Gray
Contributor

@Kaniz Fatma​  @Vidula Khanna​  @Hubert Dudek​ 

My colleague and I were finally able to get Selenium running in a notebook. Although I can't explain in detail why this solution works, I have attached the source file below.

Hopefully this might help somebody in the future!

Cheers

View solution in original post

26 REPLIES 26

Hubert-Dudek
Esteemed Contributor III

Maybe my manual on how to run selenium on Databricks will help:

In the clusters library tab, please install PyPi chromedriver-binary==83.0 (or higher, probably version in the script can also be updated)

Please run the below script from the notebook to create "/databricks/scripts/selenium-install.sh" file.

dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/selenium-install.sh","""
#!/bin/bash
apt-get update
apt-get install chromium-browser=91.0.4472.101-0ubuntu0.18.04.1 --yes
wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_linux64.zip -O /tmp/chromedriver.zip
mkdir /tmp/chromedriver
unzip /tmp/chromedriver.zip -d /tmp/chromedriver/
""", True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/"))

Please add "/databricks/scripts/selenium-install.sh" as starting script - init in cluster config.

Later in the notebook, you can use chrome, as in the below example.

from selenium import webdriver
chrome_driver = '/tmp/chromedriver/chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-dev-shm-usage') 
chrome_options.add_argument('--homedir=/dbfs/tmp')
chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
# prefs = {"download.default_directory":"/dbfs/tmp",
#          "download.prompt_for_download":False
# }
# chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)

Hi Hubert,

Thank you for your quick response! I've copied your code across to my notebook. However, when I run the following code

%sh
/dbfs/databricks/scripts/selenium-install.sh

I get the following output

Hit:1 https://repos.azul.com/zulu/deb stable InRelease
Hit:2 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Version '91.0.4472.101-0ubuntu0.18.04.1' for 'chromium-browser' was not found
/dbfs/databricks/scripts/selenium-install.sh: line 5: --yes: command not found
--2022-11-03 13:02:23--  https://chromedriver.storage.googleapis.com/91.0.4472.101/
Resolving chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)... 209.85.202.128, 2a00:1450:400b:c01::80
Connecting to chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)|209.85.202.128|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-11-03 13:02:24 ERROR 404: Not Found.
 
/dbfs/databricks/scripts/selenium-install.sh: line 7: chromedriver_linux64.zip: command not found
mkdir: invalid option -- 'd'
Try 'mkdir --help' for more information.

And consequently, when I run this code block:

from selenium import webdriver
chrome_driver = '/tmp/chromedriver/chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--homedir=/dbfs/tmp')
chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
# prefs = {"download.default_director":"/dbfs/tmp",
#          "download.prompt_for_download":False
# }
# chrome_options.add_experimental_options("prefs",prefs)
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)

I receive the following error:

WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home

Is this something you can shed some light on for me please?

Thank you for your help!

Anonymous
Not applicable

Hi @Henry Gray​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Hubert-Dudek
Esteemed Contributor III

Hi, @Henry Gray​ . I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

Kaniz
Community Manager
Community Manager

Hi @Henry Gray​ ​, We haven’t heard from you since the last response from @Hubert Dudek​ ​, and I was checking back to see if my suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Gray
Contributor

@Kaniz Fatma​  @Vidula Khanna​  @Hubert Dudek​ 

My colleague and I were finally able to get Selenium running in a notebook. Although I can't explain in detail why this solution works, I have attached the source file below.

Hopefully this might help somebody in the future!

Cheers

Kaniz
Community Manager
Community Manager

Hi @Henry Gray​​, Thank you for sharing the solution with us.

It would mean a lot if you could select the "Best Answer" to help others find the correct answer faster.

This makes that answer appear right after the question, so it's easier to find within a thread.

It also helps us mark the question as answered so we can have more eyes helping others with unanswered questions.

luck_az
New Contributor III

Hi @Henry Gray​ , there is one command in your script, which is. running forever. If i am skipping that command, my chromedriver is not working. [xvfb-run java -Dwebdriver.chrome.driver=/usr/bin/chromedriver -jar selenium-server.jar. Can you please suggest how to proceed?]

Hi,

My colleague and I also found that line started running infinitely. We tinkered with the code and did the following to make it work.

1) Remove the following two portions of code:

%sh
wget https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.1.0/selenium-server-4.1.2.jar
mv selenium-server-4.1.2.jar selenium-server.jar
%sh
sudo apt install xvfb
xvfb-run java -D webdriver.chrome.driver=/usr/bin/chromedriver -jar selenium-server.jar

2) Add the following code to the beginning:

%sh
sudo rm -r /var/lib/apt/lists/* 
sudo apt clean && 
  sudo apt update --fix-missing -y &&
  sudo apt install -y  libmysqlclient21
sudo apt install -y gdal-bin

Additionally, fyi, our runtime version of DataBricks is 0.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).

I'm not sure why this works, but hopefully it will fix your issues.

Cheers!

luck_az
New Contributor III

Thanks, it worked. Great work.

luck_az
New Contributor III

Hi @Henry Gray​  , i want to access vpn using selenium in databricks. Do you have any idea , how we can do that ?

acristinar
New Contributor II

This solution saved my life! Thank you so much for posting it!

SShiv
New Contributor II

I tried this script but got the following response. How do I fix this?

databricks_snip

EB613
New Contributor II

I had same issue try this:

from this post

%sh
sudo rm -r /var/lib/apt/lists/* 
sudo apt clean && 
   sudo apt update --fix-missing -y

 

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.