cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Failed to fetch archive.ubuntu

Tripalink
New Contributor III

I am trying to use selenium webdriver to do a scraping project in Databricks. The notebook used to run properly but now has an issue with the

Get:1 http://archive.ubuntu.com/ubuntu focal/main amd64 fonts-liberation all 1:1.07.4-11 [822 kB]

command .

In the cells prior to this, I run the following commands:

apt-get clean && sudo apt-get -y upgrade

sudo apt-get install -y

apt install libnss -y

apt install libnss3-dev libgdk-pixbuf2.0-dev libgtk-3-dev libxss-dev -y

sudo apt-get update && sudo apt-get install -y gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libnss3 lsb-release xdg-utils wget ca-certificates google-chrome-stable libgbm1 libu2f-udev libwayland-server0 udev

I attached the cell that fails and the error message. If you have any suggestions please let me know.

1 ACCEPTED SOLUTION

Accepted Solutions

Tripalink
New Contributor III

Here is what was added to the notebook to get it to run properly:

to get google-chrome and the ubuntu version to properly install

View solution in original post

6 REPLIES 6

Hubert-Dudek
Esteemed Contributor III

Maybe my manual on how to run selenium on Databricks will help:

In the clusters library tab, please install PyPi chromedriver-binary==83.0 (or higher, probably version in the script can also be updated)

Please run below script from notebook to create "/databricks/scripts/selenium-install.sh" file.

    dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
    dbutils.fs.put("/databricks/scripts/selenium-install.sh","""
    #!/bin/bash
    apt-get update
    apt-get install chromium-browser=91.0.4472.101-0ubuntu0.18.04.1 --yes
    wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_linux64.zip -O /tmp/chromedriver.zip
    mkdir /tmp/chromedriver
    unzip /tmp/chromedriver.zip -d /tmp/chromedriver/
    """, True)
    display(dbutils.fs.ls("dbfs:/databricks/scripts/"))

Please add "/databricks/scripts/selenium-install.sh" as starting script - init in cluster config.

Later in the notebook, you can use chrome, as in the below example.

    from selenium import webdriver
    chrome_driver = '/tmp/chromedriver/chromedriver'
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--headless')
    # chrome_options.add_argument('--disable-dev-shm-usage') 
    chrome_options.add_argument('--homedir=/dbfs/tmp')
    chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
    # prefs = {"download.default_directory":"/dbfs/tmp",
    #          "download.prompt_for_download":False
    # }
    # chrome_options.add_experimental_option("prefs",prefs)
    driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)

I got an error from the second line of the install script

Debayan
Databricks Employee
Databricks Employee

Hi @Dagart Allisonโ€‹ , With apt-get upgrade, could you please run apt-get update in the previous cell?

Also, you can try apt-get install (package-name) --fix-missing.

Hi, I still get the same error as I previously posted about the chromium-browser not found for that version.

Tripalink
New Contributor III

Here is what was added to the notebook to get it to run properly:

to get google-chrome and the ubuntu version to properly install

Hubert-Dudek
Esteemed Contributor III

Hi, @Dagart Allisonโ€‹ . I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group