Hi all,
Due to file size and file transfer limitation, we are receiving huge files compressed and split, in the format
FILE.z01, FILE.z02,...,FILE.zip
However, I can't find a way to unzip multipart files using databricks.
I tried already some of the steps found on the web, such as
- [bash] cat FILE*.z* > FILE_FULL.zip && unzip FILE_FULL.zip
- [python] append files + use zipfile package
- combination of the above
In all cases, the zips and zip parts are not recognized as proper files and don't get extracted.
The unzip command does not support officially multipart files (unzip(1) - Linux man page (die.net)), and the only solution I found is to use it in conjunction with zip command (see unzip man above)
[bash] zip -s- FILE*.z* -O FILE_FULL.zip && unzip FILE_FULL.zip
However, zip command is not available in databricks, and thus I could test it only locally.
I decided to ask it here because it can be useful for similar cases.
Do you have any suggestions?