Re: Databricks reading from a zip file

tariq · ‎10-25-2022

I have mounted an Azure Blob Storage in the Azure Databricks workspace filestore. The mounted container has zipped files with csv files in them. What is the best way to read the zipped files and write into a delta table?

@sasikumar sagabala

Debayan · ‎10-25-2022

Hi @Tarique Anwar , Hadoop does not have support for zip files as a compression codec. While a text file in GZip, BZip2, and other supported compression formats can be configured to be automatically decompressed in Apache Spark as long as it has the right file extension, you must perform additional steps to read zip files.

The following notebooks show how to read zip files. After you download a zip file to a temp directory, you can invoke the Azure Databricks

%sh zip

magic command to unzip the file. For the sample file used in the notebooks, the tail step removes a comment line from the unzipped file.

Please refer: https://learn.microsoft.com/en-us/azure/databricks/external-data/zip-files

Please let us know if this helps.

Rishitha · ‎06-28-2023

Hello @Debayan I recently came across the similar scenario, is there a way to do this via autoloader. We have zip Folders added daily to our AWS S3 bucket and we want to be able to unzip and load the csv files continuously (Autoloading)