cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How Selenium Webdriver works on Azure Databricks? I am unable to run a simple code.

Prabhakar1
New Contributor III

from selenium import webdriver

from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.common.by import By

from selenium.webdriver.chrome.options import Options

drivers = webdriver.Chrome(ChromeDriverManager().install())

drivers.get("https://www.google.co.in/")

drivers.find_element(By.NAME, "q").send_keys("Prabhakar Kumar Jha")

Error message

WebDriverException: Message: Service /root/.wdm/drivers/chromedriver/linux64/103.0.5060/chromedriver unexpectedly exited. Status code was: 127

5 REPLIES 5

Hubert-Dudek
Esteemed Contributor III

Hi @Prabhakar Jha​ 

In the clusters library tab, please install PyPi chromedriver-binary==83.0 (or higher, probably version in the script can also be updated)

To cluster init script, please add /databricks/scripts/selenium-install.sh, which you create using the below code.

Then, in the databricks notebook code, please use something similar to the second code below.

dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/selenium-install.sh","""
#!/bin/bash
apt-get update
apt-get install chromium-browser=91.0.4472.101-0ubuntu0.18.04.1 --yes
wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_linux64.zip -O /tmp/chromedriver.zip
mkdir /tmp/chromedriver
unzip /tmp/chromedriver.zip -d /tmp/chromedriver/
""", True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/"))
from selenium import webdriver
chrome_driver = '/tmp/chromedriver/chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-dev-shm-usage') 
chrome_options.add_argument('--homedir=/dbfs/tmp')
chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
# prefs = {"download.default_directory":"/dbfs/tmp",
#          "download.prompt_for_download":False
# }
# chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)

Hi Hubert,

Resolve that issue, but I got a new issue after this:

from selenium import webdriver

chrome_driver = '/tmp/chromedriver/chromedriver'

chrome_options = webdriver.ChromeOptions()

chrome_options.add_argument('--no-sandbox')

chrome_options.add_argument('--headless')

chrome_options.add_argument('--disable-dev-shm-usage') 

chrome_options.add_argument('--homedir=/dbfs/tmp')

chrome_options.add_argument('--user-data-dir=/dbfs/selenium')

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get('https://www.google.com/')

But the error is:

Message: unknown error: Chrome failed to start: exited abnormally.

(unknown error: DevToolsActivePort file doesn't exist)

(The process started from chrome location /usr/bin/google-chrome is no longer running, so ChromeDriver is assuming that Chrome has crashed.)

Hi Prabhakar and @Hubert Dudek​,

Did you find out a solution to your new issue "Message: unknown error: Chrome failed to start: exited abnormally." ?

I am stuck with the same error. I tried to have selenium works with Databricks but without success.

Hubert-Dudek
Esteemed Contributor III

Hi, @Prabhakar Jha​ ​ I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

Evan_MCK
Contributor

I also got that error. What worked for me was downloading the chrome driver and ensuring its the latest version with shell scripts in the same notebook I used for web scraping. I could not use the web driver manager. You can see all the details here: https://stackoverflow.com/questions/69192050/using-selenium-within-databricks-chrome-not-reachable/7...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group