cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

ChromeDriver installation in Databricks

AJ270990
Contributor II

I am working on a Webscraping logic and need to install Chrome driver. How can I install it in the Databricks workbook ?

1 ACCEPTED SOLUTION

Accepted Solutions

Evan_MCK
Contributor

What worked for me was downloading the chrome driver and ensuring its the latest version with shell scripts in the same notebook I used for web scraping. You can see all the details here: https://stackoverflow.com/questions/69192050/using-selenium-within-databricks-chrome-not-reachable/7...

View solution in original post

4 REPLIES 4

Hubert-Dudek
Esteemed Contributor III

@Abhishek Jainโ€‹ , In the clusters library tab, please install PyPi chromedriver-binary==83.0 (or higher, probably version in the script can also be updated)

To cluster init script, please add /databricks/scripts/selenium-install.sh, which you create using the below code.

Then, in the databricks notebook code, please use something similar to the second code below.

dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
dbutils.fs.put("/databricks/scripts/selenium-install.sh","""
#!/bin/bash
apt-get update
apt-get install chromium-browser=91.0.4472.101-0ubuntu0.18.04.1 --yes
wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_linux64.zip -O /tmp/chromedriver.zip
mkdir /tmp/chromedriver
unzip /tmp/chromedriver.zip -d /tmp/chromedriver/
""", True)
display(dbutils.fs.ls("dbfs:/databricks/scripts/"))
from selenium import webdriver
chrome_driver = '/tmp/chromedriver/chromedriver'
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--headless')
# chrome_options.add_argument('--disable-dev-shm-usage') 
chrome_options.add_argument('--homedir=/dbfs/tmp')
chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
# prefs = {"download.default_directory":"/dbfs/tmp",
#          "download.prompt_for_download":False
# }
# chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)

Buga
New Contributor II

Hi,

I'm trying to use this solution to use seleium in databricks, but i can't.

Can you help me ?

Follow imageimage

Hubert-Dudek
Esteemed Contributor III

Hi, @Gustavo Queirozโ€‹  โ€‹ I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

Evan_MCK
Contributor

What worked for me was downloading the chrome driver and ensuring its the latest version with shell scripts in the same notebook I used for web scraping. You can see all the details here: https://stackoverflow.com/questions/69192050/using-selenium-within-databricks-chrome-not-reachable/7...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group