โ11-01-2022 04:16 AM
Hello,
Iโm programming in a notebook and attempting to use the python library Selenium to automate Chrome/chromedriver. Iโve successfully managed to install selenium using
%sh
pip install selenium
I then attempt the following code, which results in the WebdriverException, copied below.
from selenium import webdriver
driver = webdriver.Chrome()
Error:
WebdriverException: Message: โchromedriverโ executable needs to be in PATH. Please see https://chromedriver.chromium.org/home
After troubleshooting the error, I attempted instead to use webdriver-manager to install the instance of chromedriver as follows, whilst also running it headless.
%sh
pip install webdriver-manager
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument(โโheadlessโ)
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
This time, I got the following error:
WebdriverException: Message: Service /root/.wdm/drivers/chromedriver/linux64/107.0.5304/chromedriver unexpectedly exited. Status code was: 127
Iโve roamed the internet for a solution, but no matter what I try, my code ends up throwing one of the two WebDriverException errors above.
Does anybody know how I can get selenium running on DataBricks in order to automate Chrome/chromedriver?
Thanks!
โ11-11-2022 02:04 AM
@Kaniz Fatmaโ @Vidula Khannaโ @Hubert Dudekโ
My colleague and I were finally able to get Selenium running in a notebook. Although I can't explain in detail why this solution works, I have attached the source file below.
Hopefully this might help somebody in the future!
Cheers
โ11-01-2022 09:07 AM
Maybe my manual on how to run selenium on Databricks will help:
In the clusters library tab, please install PyPi chromedriver-binary==83.0 (or higher, probably version in the script can also be updated)
Please run the below script from the notebook to create "/databricks/scripts/selenium-install.sh" file.
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/selenium-install.sh","""
#!/bin/bash
apt-get update
apt-get install chromium-browser=91.0.4472.101-0ubuntu0.18.04.1 --yes
wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_linux64.zip -O /tmp/chromedriver.zip
mkdir /tmp/chromedriver
unzip /tmp/chromedriver.zip -d /tmp/chromedriver/
""", True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/"))
Please add "/databricks/scripts/selenium-install.sh" as starting script - init in cluster config.
Later in the notebook, you can use chrome, as in the below example.
from selenium import webdriver
chrome_driver = '/tmp/chromedriver/chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--homedir=/dbfs/tmp')
chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
# prefs = {"download.default_directory":"/dbfs/tmp",
# "download.prompt_for_download":False
# }
# chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)
โ11-03-2022 06:10 AM
Hi Hubert,
Thank you for your quick response! I've copied your code across to my notebook. However, when I run the following code
%sh
/dbfs/databricks/scripts/selenium-install.sh
I get the following output
Hit:1 https://repos.azul.com/zulu/deb stable InRelease
Hit:2 http://security.ubuntu.com/ubuntu focal-security InRelease
Hit:3 http://archive.ubuntu.com/ubuntu focal InRelease
Hit:4 http://archive.ubuntu.com/ubuntu focal-updates InRelease
Hit:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease
Reading package lists...
Reading package lists...
Building dependency tree...
Reading state information...
E: Version '91.0.4472.101-0ubuntu0.18.04.1' for 'chromium-browser' was not found
/dbfs/databricks/scripts/selenium-install.sh: line 5: --yes: command not found
--2022-11-03 13:02:23-- https://chromedriver.storage.googleapis.com/91.0.4472.101/
Resolving chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)... 209.85.202.128, 2a00:1450:400b:c01::80
Connecting to chromedriver.storage.googleapis.com (chromedriver.storage.googleapis.com)|209.85.202.128|:443... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-11-03 13:02:24 ERROR 404: Not Found.
/dbfs/databricks/scripts/selenium-install.sh: line 7: chromedriver_linux64.zip: command not found
mkdir: invalid option -- 'd'
Try 'mkdir --help' for more information.
And consequently, when I run this code block:
from selenium import webdriver
chrome_driver = '/tmp/chromedriver/chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-dev-shm-usage')
chrome_options.add_argument('--homedir=/dbfs/tmp')
chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
# prefs = {"download.default_director":"/dbfs/tmp",
# "download.prompt_for_download":False
# }
# chrome_options.add_experimental_options("prefs",prefs)
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)
I receive the following error:
WebDriverException: Message: 'chromedriver' executable needs to be in PATH. Please see https://chromedriver.chromium.org/home
Is this something you can shed some light on for me please?
Thank you for your help!
โ11-06-2022 12:12 AM
Hi @Henry Grayโ
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
โ11-09-2022 06:25 AM
Hi, @Henry Grayโ . I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL
โ11-11-2022 02:04 AM
@Kaniz Fatmaโ @Vidula Khannaโ @Hubert Dudekโ
My colleague and I were finally able to get Selenium running in a notebook. Although I can't explain in detail why this solution works, I have attached the source file below.
Hopefully this might help somebody in the future!
Cheers
โ12-01-2022 07:06 AM
Hi @Henry Grayโ , there is one command in your script, which is. running forever. If i am skipping that command, my chromedriver is not working. [xvfb-run java -Dwebdriver.chrome.driver=/usr/bin/chromedriver -jar selenium-server.jar. Can you please suggest how to proceed?]
โ12-01-2022 07:19 AM
Hi,
My colleague and I also found that line started running infinitely. We tinkered with the code and did the following to make it work.
1) Remove the following two portions of code:
%sh
wget https://github.com/SeleniumHQ/selenium/releases/download/selenium-4.1.0/selenium-server-4.1.2.jar
mv selenium-server-4.1.2.jar selenium-server.jar
%sh
sudo apt install xvfb
xvfb-run java -D webdriver.chrome.driver=/usr/bin/chromedriver -jar selenium-server.jar
2) Add the following code to the beginning:
%sh
sudo rm -r /var/lib/apt/lists/*
sudo apt clean &&
sudo apt update --fix-missing -y &&
sudo apt install -y libmysqlclient21
sudo apt install -y gdal-bin
Additionally, fyi, our runtime version of DataBricks is 0.4 LTS (includes Apache Spark 3.2.1, Scala 2.12).
I'm not sure why this works, but hopefully it will fix your issues.
Cheers!
โ12-01-2022 07:56 AM
Thanks, it worked. Great work.
โ12-04-2022 11:27 PM
Hi @Henry Grayโ , i want to access vpn using selenium in databricks. Do you have any idea , how we can do that ?
โ12-22-2022 09:58 AM
This solution saved my life! Thank you so much for posting it!
โ12-04-2022 09:12 PM
โ06-27-2023 06:20 AM
I had same issue try this:
from this post
%sh sudo rm -r /var/lib/apt/lists/* sudo apt clean && sudo apt update --fix-missing -y
โ11-08-2023 11:08 PM - edited โ11-08-2023 11:10 PM
Hi Gray, I was looking for your script but I don't think you no longer have any file attached to your reply. Would really love your help on this!
โ01-18-2024 10:34 AM
The attached source file seems to be missing
Also what cluster access type are you running? Shared doesnt let us access the file system since it is protected resulting in error like:
WebDriverException: Message: Can not connect to the Service /databricks/.pyenv/bin/chromedriver
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group