11-01-2022 04:16 AM
Hello,
I’m programming in a notebook and attempting to use the python library Selenium to automate Chrome/chromedriver. I’ve successfully managed to install selenium using
%sh
pip install selenium
I then attempt the following code, which results in the WebdriverException, copied below.
from selenium import webdriver
driver = webdriver.Chrome()
Error:
WebdriverException: Message: ‘chromedriver’ executable needs to be in PATH. Please see https://chromedriver.chromium.org/home
After troubleshooting the error, I attempted instead to use webdriver-manager to install the instance of chromedriver as follows, whilst also running it headless.
%sh
pip install webdriver-manager
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument(“—headless”)
driver = webdriver.Chrome(ChromeDriverManager().install(), options=options)
This time, I got the following error:
WebdriverException: Message: Service /root/.wdm/drivers/chromedriver/linux64/107.0.5304/chromedriver unexpectedly exited. Status code was: 127
I’ve roamed the internet for a solution, but no matter what I try, my code ends up throwing one of the two WebDriverException errors above.
Does anybody know how I can get selenium running on DataBricks in order to automate Chrome/chromedriver?
Thanks!
03-13-2024 04:29 AM
Hi @Gray ,
I do not find your attached source file? It might be helpful as I am facing the same issue.
Thanks,
03-13-2024 05:53 AM
03-13-2024 06:57 PM
Thank you!
Which databricks runtine engine version did you use?
I am facing some trouble with apt-get, due to security I think, so it still fails at that step.
03-14-2024 06:34 AM
I used 10.4 ML.
03-13-2024 11:39 PM
03-13-2024 08:41 AM
this is what im using to install currently
03-13-2024 08:42 AM
also check out playwright its a lot easier to install
https://community.databricks.com/t5/community-discussions/using-python-rpa-library-on-databricks/td-...
03-15-2024 12:36 AM
@Kaizen , @Evan_MCK : I refactored here a notebook with the elements collected from your posts. I works.
# imports needed for notebook
from datetime import datetime
import dateutil.relativedelta
import os
import time
import urllib.request, json
def get_latest_driver_url():
with urllib.request.urlopen("https://googlechromelabs.github.io/chrome-for-testing/last-known-good-versions-with-downloads.json") as url:
data = json.load(url)
print(data['channels']['Stable']['version'])
url = data['channels']['Stable']['downloads']['chromedriver'][0]['url']
# print(url)
# set the url as environment variable to use in scripting
# os.environ['latest_chromedriver_url']= url
return url
latest_chromedriver_url = get_latest_driver_url()
print(latest_chromedriver_url)
Make an init script
dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/selenium-install.sh",f"""
#!/bin/bash
latest_chromedriver_url="{latest_chromedriver_url}"
wget -N $latest_chromedriver_url -O /tmp/chromedriver_linux64.zip
rm -rf /tmp/chromedriver/
unzip /tmp/chromedriver_linux64.zip -d /tmp/chromedriver/
sudo apt-get clean && sudo apt-get update --fix-missing -y
sudo curl -sS -o - https://dl-ssl.google.com/linux/linux_signing_key.pub | apt-key add
sudo echo "deb https://dl.google.com/linux/chrome/deb/ stable main" >> /etc/apt/sources.list.d/google-chrome.list
sudo apt-get -y update
sudo apt-get -y install google-chrome-stable
""", True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/"))
Run it (or put in cluster init script config to automatically run it at cluster start)
%sh
/dbfs/databricks/scripts/selenium-install.sh
Install Selenium and restart Python kernel (or put it in PiPy package to install at start of cluster)
%pip install selenium
dbutils.library.restartPython()
Init the driver
# imports needed for notebook
from datetime import datetime
import dateutil.relativedelta
import os
import time
import urllib.request, json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support.ui import WebDriverWait
def init_chrome_browser(download_path, chrome_driver_path, url):
options = Options()
prefs = {'download.default_directory' : download_path, 'profile.default_content_setting_values.automatic_downloads': 1, "download.prompt_for_download": False,
"download.directory_upgrade": True, "safebrowsing.enabled": True ,
"translate_whitelists": {"vi":"en"},
"translate":{"enabled":"true"}}
options.add_experimental_option('prefs', prefs)
options.add_argument('--no-sandbox')
options.add_argument('--headless') # wont work without this feature in databricks can't display browser
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--start-maximized')
options.add_argument('window-size=2560,1440')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--ignore-ssl-errors')
options.add_argument('--lang=en')
options.add_experimental_option('excludeSwitches', ['enable-logging'])
print(f"{datetime.now()} Launching Chrome...")
browser = webdriver.Chrome(service=Service(chrome_driver_path), options=options)
print(f"{datetime.now()} Chrome launched.")
browser.get(url)
print(f"{datetime.now()} Browser ready to use.")
return browser
driver = init_chrome_browser(
download_path="/tmp/downloads",
chrome_driver_path="/tmp/chromedriver/chromedriver-linux64/chromedriver",
url= "https://www.google.com"
)
Test it
from selenium.webdriver.common.by import By
driver.find_element(By.CSS_SELECTOR, "img").get_attribute("alt")
Close the driver
driver.quit()
12-15-2022 03:36 AM
I also tried the script and am getting similar error. Can anyone please give some resolution for it?
Error in Failed to fetch http://archive.ubuntu.com/ubuntu/pool/main/s/systemd/udev_245.4-4ubuntu3.18_amd64.deb and Unable to fetch some archives
06-27-2023 06:20 AM
I had same issue try this as i answered previous question:
from this post
%sh sudo rm -r /var/lib/apt/lists/* sudo apt clean && sudo apt update --fix-missing -y
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group