cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Failed to fetch archive.ubuntu

Tripalink
New Contributor III

I am trying to use selenium webdriver to do a scraping project in Databricks. The notebook used to run properly but now has an issue with the

Get:1 http://archive.ubuntu.com/ubuntu focal/main amd64 fonts-liberation all 1:1.07.4-11 [822 kB]

command .

In the cells prior to this, I run the following commands:

apt-get clean && sudo apt-get -y upgrade

sudo apt-get install -y

apt install libnss -y

apt install libnss3-dev libgdk-pixbuf2.0-dev libgtk-3-dev libxss-dev -y

sudo apt-get update && sudo apt-get install -y gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libnss3 lsb-release xdg-utils wget ca-certificates google-chrome-stable libgbm1 libu2f-udev libwayland-server0 udev

I attached the cell that fails and the error message. If you have any suggestions please let me know.

1 ACCEPTED SOLUTION

Accepted Solutions

Tripalink
New Contributor III

Here is what was added to the notebook to get it to run properly:

to get google-chrome and the ubuntu version to properly install

View solution in original post

7 REPLIES 7

Hubert-Dudek
Esteemed Contributor III

Maybe my manual on how to run selenium on Databricks will help:

In the clusters library tab, please install PyPi chromedriver-binary==83.0 (or higher, probably version in the script can also be updated)

Please run below script from notebook to create "/databricks/scripts/selenium-install.sh" file.

    dbutils.fs.mkdirs("dbfs:/databricks/scripts/")
    dbutils.fs.put("/databricks/scripts/selenium-install.sh","""
    #!/bin/bash
    apt-get update
    apt-get install chromium-browser=91.0.4472.101-0ubuntu0.18.04.1 --yes
    wget https://chromedriver.storage.googleapis.com/91.0.4472.101/chromedriver_linux64.zip -O /tmp/chromedriver.zip
    mkdir /tmp/chromedriver
    unzip /tmp/chromedriver.zip -d /tmp/chromedriver/
    """, True)
    display(dbutils.fs.ls("dbfs:/databricks/scripts/"))

Please add "/databricks/scripts/selenium-install.sh" as starting script - init in cluster config.

Later in the notebook, you can use chrome, as in the below example.

    from selenium import webdriver
    chrome_driver = '/tmp/chromedriver/chromedriver'
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument('--headless')
    # chrome_options.add_argument('--disable-dev-shm-usage') 
    chrome_options.add_argument('--homedir=/dbfs/tmp')
    chrome_options.add_argument('--user-data-dir=/dbfs/selenium')
    # prefs = {"download.default_directory":"/dbfs/tmp",
    #          "download.prompt_for_download":False
    # }
    # chrome_options.add_experimental_option("prefs",prefs)
    driver = webdriver.Chrome(executable_path=chrome_driver, options=chrome_options)

I got an error from the second line of the install script

Debayan
Esteemed Contributor III
Esteemed Contributor III

Hi @Dagart Allison​ , With apt-get upgrade, could you please run apt-get update in the previous cell?

Also, you can try apt-get install (package-name) --fix-missing.

Kaniz
Community Manager
Community Manager

Hi @Dagart Allison​ ​, We haven’t heard from you since the last response from @Debayan Mukherjee​, and I was checking back to see if you have a resolution yet.

If you have any solution, please share it with the community as it can be helpful to others. Otherwise, we will respond with more details and try to help.

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Tripalink
New Contributor III

Hi, I still get the same error as I previously posted about the chromium-browser not found for that version.

Tripalink
New Contributor III

Here is what was added to the notebook to get it to run properly:

to get google-chrome and the ubuntu version to properly install

Hubert-Dudek
Esteemed Contributor III

Hi, @Dagart Allison​ . I've created a new version of the selenium with the databricks manual. Please look here https://community.databricks.com/s/feed/0D58Y00009SWgVuSAL

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.