Expand and read Zip compressed files not working
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-06-2023 10:25 PM
I am trying to unzip compressed files following this doc (https://docs.databricks.com/en/files/unzip-files.html) but I am getting the error.
When I run:
dbutils.fs.mv("file:/LoanStats3a.csv", "dbfs:/tmp/LoanStats3a.csv")
I get the following error:
java.io.FileNotFoundException: File file:/LoanStats3a.csv does not exist
Where is the unzipped csv file being saved? I also tried "file:/tmp/file:/LoanStats3a.csv" as the location but that did not work either.
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-10-2023 05:52 PM
Didn't solve the issue with this example but I figured out how to specify the location where the unzipped files are saved using an unzipping library.
I used gunzip to unzip my own gzip files like this:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-11-2023 07:38 AM - edited 12-11-2023 07:40 AM
Hey @MrDataMan,
I wasn't able to reproduce the exact same error you did get, but I still got a similar error while trying to run the example. To solve it, I tweaked the code a little bit:
%sh curl https://resources.lendingclub.com/LoanStats3a.csv.zip --output /dbfs/tmp/LoanStats3a.csv.zip
unzip /dbfs/tmp/LoanStats3a.csv.zip -d /dbfs/tmp/
As you can see, I have changed the output location of the curl command and I have specified the destination of the unzip command so that both point to DBFS instead of the root tmp/ directory.
Then we can read it using Spark:
df = spark.read.format("csv").option("skipRows", 1).option("header", True).load("dbfs:/tmp/LoanStats3a.csv")
display(df)
Note: Access to DBFS is required for this example.
Thanks,
Gab

