<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Expand and read Zip compressed files not working in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/expand-and-read-zip-compressed-files-not-working/m-p/54815#M30169</link>
    <description>&lt;P&gt;I am trying to unzip compressed files following this doc (&lt;A href="https://docs.databricks.com/en/files/unzip-files.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/files/unzip-files.html&lt;/A&gt;) but I am getting the error.&lt;/P&gt;&lt;P&gt;When I run:&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;dbutils&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;fs&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;mv&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"file:/LoanStats3a.csv"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;"dbfs:/tmp/LoanStats3a.csv"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;I get the following error:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN&gt;java.io.FileNotFoundException: File file:/LoanStats3a.csv does not exist&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;Where is the unzipped csv file being saved? I also tried "file:/tmp/&lt;SPAN class=""&gt;&lt;A target="_blank" rel="noopener"&gt;file:/LoanStats3a.csv" &lt;/A&gt;as the location but that did not work either.&lt;/SPAN&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 07 Dec 2023 06:25:25 GMT</pubDate>
    <dc:creator>MrDataMan</dc:creator>
    <dc:date>2023-12-07T06:25:25Z</dc:date>
    <item>
      <title>Expand and read Zip compressed files not working</title>
      <link>https://community.databricks.com/t5/data-engineering/expand-and-read-zip-compressed-files-not-working/m-p/54815#M30169</link>
      <description>&lt;P&gt;I am trying to unzip compressed files following this doc (&lt;A href="https://docs.databricks.com/en/files/unzip-files.html" target="_blank" rel="noopener"&gt;https://docs.databricks.com/en/files/unzip-files.html&lt;/A&gt;) but I am getting the error.&lt;/P&gt;&lt;P&gt;When I run:&lt;/P&gt;&lt;PRE&gt;&lt;SPAN class=""&gt;dbutils&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;fs&lt;/SPAN&gt;&lt;SPAN class=""&gt;.&lt;/SPAN&gt;&lt;SPAN class=""&gt;mv&lt;/SPAN&gt;&lt;SPAN class=""&gt;(&lt;/SPAN&gt;&lt;SPAN class=""&gt;"file:/LoanStats3a.csv"&lt;/SPAN&gt;&lt;SPAN class=""&gt;,&lt;/SPAN&gt; &lt;SPAN class=""&gt;"dbfs:/tmp/LoanStats3a.csv"&lt;/SPAN&gt;&lt;SPAN class=""&gt;)&lt;/SPAN&gt;&amp;nbsp;&lt;/PRE&gt;&lt;P&gt;I get the following error:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;&lt;SPAN&gt;java.io.FileNotFoundException: File file:/LoanStats3a.csv does not exist&lt;/SPAN&gt;&lt;/PRE&gt;&lt;P&gt;Where is the unzipped csv file being saved? I also tried "file:/tmp/&lt;SPAN class=""&gt;&lt;A target="_blank" rel="noopener"&gt;file:/LoanStats3a.csv" &lt;/A&gt;as the location but that did not work either.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2023 06:25:25 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/expand-and-read-zip-compressed-files-not-working/m-p/54815#M30169</guid>
      <dc:creator>MrDataMan</dc:creator>
      <dc:date>2023-12-07T06:25:25Z</dc:date>
    </item>
    <item>
      <title>Re: Expand and read Zip compressed files not working</title>
      <link>https://community.databricks.com/t5/data-engineering/expand-and-read-zip-compressed-files-not-working/m-p/55009#M30217</link>
      <description>&lt;P&gt;Didn't solve the issue with this example but I figured out how to specify the location where the unzipped files are saved using an unzipping library.&lt;/P&gt;&lt;P&gt;I used gunzip to unzip my own gzip files like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;for&lt;/SPAN&gt;&lt;SPAN&gt; file &lt;/SPAN&gt;&lt;SPAN&gt;in&lt;/SPAN&gt; &lt;SPAN&gt;"$SOURCE_DIR"&lt;/SPAN&gt;&lt;SPAN&gt;/*.gz; &lt;/SPAN&gt;&lt;SPAN&gt;do&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;echo &lt;/SPAN&gt;&lt;SPAN&gt;"Unzipping $file..."&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;gunzip -c &lt;/SPAN&gt;&lt;SPAN&gt;"$file"&lt;/SPAN&gt;&lt;SPAN&gt; &amp;gt; &lt;/SPAN&gt;&lt;SPAN&gt;"$TARGET_DIR/$(basename "&lt;/SPAN&gt;&lt;SPAN&gt;$file&lt;/SPAN&gt;&lt;SPAN&gt;" .gz)"&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;rm &lt;/SPAN&gt;&lt;SPAN&gt;"$file"&lt;/SPAN&gt; &lt;SPAN&gt;# Delete the original .gz file&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;done&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Mon, 11 Dec 2023 01:52:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/expand-and-read-zip-compressed-files-not-working/m-p/55009#M30217</guid>
      <dc:creator>MrDataMan</dc:creator>
      <dc:date>2023-12-11T01:52:47Z</dc:date>
    </item>
    <item>
      <title>Re: Expand and read Zip compressed files not working</title>
      <link>https://community.databricks.com/t5/data-engineering/expand-and-read-zip-compressed-files-not-working/m-p/55064#M30232</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/95831"&gt;@MrDataMan&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;I wasn't able to reproduce the exact same error you did get, but I still got a similar error while trying to run the example. To solve it, I tweaked the code a little bit:&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;%sh curl https://resources.lendingclub.com/LoanStats3a.csv.zip --output /dbfs/tmp/LoanStats3a.csv.zip
unzip /dbfs/tmp/LoanStats3a.csv.zip -d /dbfs/tmp/&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;As you can see, I have changed the output location of the curl command and I have specified the destination of the unzip command so that both point to DBFS instead of the root tmp/ directory.&lt;/P&gt;
&lt;P&gt;Then we can read it using Spark:&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;df = spark.read.format("csv").option("skipRows", 1).option("header", True).load("dbfs:/tmp/LoanStats3a.csv")
display(df)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;STRONG&gt;Note&lt;/STRONG&gt;: Access to DBFS is required for this example.&lt;/P&gt;
&lt;P&gt;&amp;nbsp;Thanks,&lt;/P&gt;
&lt;P&gt;Gab&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 11 Dec 2023 15:40:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/expand-and-read-zip-compressed-files-not-working/m-p/55064#M30232</guid>
      <dc:creator>gabsylvain</dc:creator>
      <dc:date>2023-12-11T15:40:24Z</dc:date>
    </item>
  </channel>
</rss>

