cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unzip multipart files

N_M
Contributor

Hi all,

Due to file size and file transfer limitation, we are receiving huge files compressed and split, in the format
    FILE.z01, FILE.z02,...,FILE.zip

However, I can't find a way to unzip multipart files using databricks.

I tried already some of the steps found on the web, such as

 - [bash] cat FILE*.z* > FILE_FULL.zip && unzip FILE_FULL.zip
 - [python] append files + use zipfile package
 - combination of the above

In all cases, the zips and zip parts are not recognized as proper files and don't get extracted.

The unzip command does not support officially multipart files (unzip(1) - Linux man page (die.net)), and the only solution I found is to use it in conjunction with zip command (see unzip man above)

[bash] zip -s- FILE*.z* -O FILE_FULL.zip && unzip FILE_FULL.zip

However, zip command is not available in databricks, and thus I could test it only locally.

I decided to ask it here because it can be useful for similar cases.

Do you have any suggestions?

 

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group