Databricks reading from a zip file
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2022 07:53 AM
I have mounted an Azure Blob Storage in the Azure Databricks workspace filestore. The mounted container has zipped files with csv files in them. What is the best way to read the zipped files and write into a delta table?
@sasikumar sagabala
- Labels:
-
Azure
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-25-2022 03:17 PM
Hi @Tarique Anwar , Hadoop does not have support for zip files as a compression codec. While a text file in GZip, BZip2, and other supported compression formats can be configured to be automatically decompressed in Apache Spark as long as it has the right file extension, you must perform additional steps to read zip files.
The following notebooks show how to read zip files. After you download a zip file to a temp directory, you can invoke the Azure Databricks
%sh zip
magic command to unzip the file. For the sample file used in the notebooks, the tail step removes a comment line from the unzipped file.
Please refer: https://learn.microsoft.com/en-us/azure/databricks/external-data/zip-files
Please let us know if this helps.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
06-28-2023 11:04 AM
Hello @Debayan I recently came across the similar scenario, is there a way to do this via autoloader. We have zip Folders added daily to our AWS S3 bucket and we want to be able to unzip and load the csv files continuously (Autoloading)