cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Volumes unzip files

vannipart
New Contributor III

I have this shell unzip that I use to unzip files 

%sh
sudo apt-get update
sudo apt-get install -y p7zip-full
 
But when it comes to new workspace, I get error 
sudo: a terminal is required to read the password; either use the -S option to read from standard input or configure an askpass helper sudo: a password is required Reading package lists... E: Could not open lock file /var/lib/apt/lists/lock - open (13: Permission denied) E: Unable to lock directory /var/lib/apt/lists/ E: Could not open lock file /var/lib/dpkg/lock-frontend - open (13: Permission denied) E: Unable to acquire the dpkg frontend lock (/var/lib/dpkg/lock-frontend), are you root? bash: line 7: 7z: command not foundHow could I unzip files with volumes, If I already have them there ? 

should I make a shell init script to do that or how ? 

The files are password protected 
The unzip that worked 

%sh for file in /dbfs/mnt/zip/$source/*.zip do 7z x "$file" -p$pw -o/dbfs/mnt/zip/$source/unzipped/ -y done
Any good ideas welcome 🙂 

 

 

 

2 REPLIES 2

VZLA
Databricks Employee
Databricks Employee

Thank you for your question! Have you tried using a cluster init script to install p7zip automatically when the cluster starts? This avoids the need for sudo during your session.
Alternatively, if unzip is already available, you can modify your script like this:

%sh
for file in /dbfs/mnt/zip/$source/*.zip
do
  unzip -P "$pw" "$file" -d /dbfs/mnt/zip/$source/unzipped/
done

karthickrs
New Contributor II

First, you can read the ZIP file in a binary format [ spark.read.format("binaryFile") ], then use the zipfile Python package to unzip and extract all the files from the zipped file and store them in a Volume.

Karthick Ramachandran Seshadri
Data Architect | MS/MBA
Data + AI/ML/GenAI
17x Databricks Credentials

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now