cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Volumes unzip files

vannipart
New Contributor III

I have this shell unzip that I use to unzip files 

%sh
sudo apt-get update
sudo apt-get install -y p7zip-full
 
But when it comes to new workspace, I get error 
sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper sudo: a password is required Reading package lists... E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied) E: Unable to lock directory /var/lib/apt/lists/ E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root? bash: line 7: 7z: command not foundHow could I unzip files with volumes, If I already have them there ? 

should I make a shell init script to do that or how ? 

The files are password protected 
The unzip that worked 

%sh for file in /dbfs/mnt/zip/$source/*.zip do 7z x "$file" -p$pw -o/dbfs/mnt/zip/$source/unzipped/ -y done
Any good ideas welcome ๐Ÿ™‚ 

 

 

 

2 REPLIES 2

VZLA
Databricks Employee
Databricks Employee

Thank you for your question! Have you tried using a cluster init script to install p7zip automatically when the cluster starts? This avoids the need for sudo during your session.
Alternatively, if unzip is already available, you can modify your script like this:

%sh
for file in /dbfs/mnt/zip/$source/*.zip
do
  unzip -P "$pw" "$file" -d /dbfs/mnt/zip/$source/unzipped/
done

karthickrs
New Contributor II

First, you can read the ZIP file in a binary format [ spark.read.format("binaryFile") ], then use the zipfile Python package to unzip and extract all the files from the zipped file and store them in a Volume.

Karthick Ramachandran Seshadri
Data Architect | MS/MBA
Data + AI/ML/GenAI
17x Databricks Credentials

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group