<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: unzip twice the same file not executing in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11864#M6779</link>
    <description>&lt;P&gt;I would like my code to be fully wwritten in python if possible.&lt;/P&gt;</description>
    <pubDate>Fri, 26 Nov 2021 18:26:21 GMT</pubDate>
    <dc:creator>RantoB</dc:creator>
    <dc:date>2021-11-26T18:26:21Z</dc:date>
    <item>
      <title>unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11849#M6764</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to unzip some files that are ingested but when I unzip twice the same zipped file, the unzip command does not execute :&lt;/P&gt;&lt;P&gt;As suggesgted in the documentation I did :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;import urllib 
urllib.request.urlretrieve("https://resources.lendingclub.com/LoanStats3a.csv.zip", "/tmp/LoanStats3a.csv.zip")&lt;/CODE&gt;&lt;/PRE&gt;&lt;PRE&gt;&lt;CODE&gt;%sh
unzip /tmp/LoanStats3a.csv.zip&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;but when it apply again unzip, command never execute and seems to be blocked in a no out loop.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for you help.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 15:17:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11849#M6764</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-02T15:17:44Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11851#M6766</link>
      <description>&lt;P&gt;Ok.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Not that I have the same behaviour when I'm usig the python api :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;with zipfile.ZipFile(path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 15:24:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11851#M6766</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-02T15:24:23Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11852#M6767</link>
      <description>&lt;P&gt;If you're going to be reading the files with Spark, you don't need to unzip them.  &lt;A href="http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.csv.html?highlight=csv#pyspark.sql.DataFrameReader.csv" alt="http://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.csv.html?highlight=csv#pyspark.sql.DataFrameReader.csv" target="_blank"&gt;Spark's CSV reader &lt;/A&gt;can read zipped or unzipped CSVs. &lt;/P&gt;&lt;P&gt;If you're going to be using URL retrieve, remember that it will put the files on the driver and not in DBFS so you'll have to move it into the distributed filesystem to use Spark to read them.   &lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 15:47:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11852#M6767</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-02T15:47:12Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11853#M6768</link>
      <description>&lt;P&gt;Actually, I will not be reading the file with SPark at this stage and I not using URL retrieve either, that was just for the reproductible example.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Zipped files are ingested on ADLS Gen2 and I unzip them into distinct directories depending on their names. But when I execute my script a second time, I am facing the problem I described above.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 15:54:29 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11853#M6768</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-02T15:54:29Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11854#M6769</link>
      <description>&lt;P&gt;So when you use %sh it's going to use the file system on the driver, which is temporary.  The driver storage is the local disc on a VM, not ADL2.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 15:58:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11854#M6769</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2021-11-02T15:58:05Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11855#M6770</link>
      <description>&lt;P&gt;Yes I understand but whatever I do with the unzipped files, I am asking why is there a problem executing twice the unzip action ?&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 16:02:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11855#M6770</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-02T16:02:31Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11856#M6771</link>
      <description>&lt;OL&gt;&lt;LI&gt;Community edition can have blocked saving to file system or executing %sh&lt;/LI&gt;&lt;LI&gt;In other editions please verify that file there is. It can be saved for example to dbfs folder.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;Following command could help:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dbutils.fs.ls("dbfs:/tmp/")
%sh
ls /dbfs/tmp&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;you can also consider to adjust your script to use dbfs prefix for example:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;    import urllib 
    urllib.request.urlretrieve("https://resources.lendingclub.com/LoanStats3a.csv.zip", "/dbfs/tmp/LoanStats3a.csv.zip")&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 16:25:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11856#M6771</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-11-02T16:25:56Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11857#M6772</link>
      <description>&lt;P&gt;I am trying to run :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;with zipfile.ZipFile(path, 'r') as zip_ref:
    zip_ref.extractall(directory_to_extract_to)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;but I think I am facing some issues because my zip file is quite large.&lt;/P&gt;</description>
      <pubDate>Tue, 02 Nov 2021 17:53:24 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11857#M6772</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-02T17:53:24Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11858#M6773</link>
      <description>&lt;P&gt;hi @Bertrand BURCKER​&amp;nbsp;,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Are you still having this issue or you were able to solve it? Please let us know.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Nov 2021 00:23:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11858#M6773</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-11-16T00:23:33Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11859#M6774</link>
      <description>&lt;P&gt;No, still not solved.&lt;/P&gt;</description>
      <pubDate>Tue, 16 Nov 2021 08:32:55 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11859#M6774</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-16T08:32:55Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11860#M6775</link>
      <description>&lt;P&gt;Hi @Bertrand BURCKER​&amp;nbsp;as you have mentioned your zip file is large, can you let us know the size of the file? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Also, have you tried with a smaller zip file, and what is the result?&lt;/P&gt;</description>
      <pubDate>Tue, 16 Nov 2021 16:39:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11860#M6775</guid>
      <dc:creator>Prabakar</dc:creator>
      <dc:date>2021-11-16T16:39:05Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11861#M6776</link>
      <description>&lt;P&gt;what is you databricks version Azure or free community edition?&lt;/P&gt;</description>
      <pubDate>Tue, 16 Nov 2021 18:04:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11861#M6776</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-11-16T18:04:08Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11862#M6777</link>
      <description>&lt;P&gt;&lt;/P&gt;&lt;P&gt;Have you try the examples from this article? &lt;A href="https://docs.databricks.com/data/data-sources/zip-files.html" alt="https://docs.databricks.com/data/data-sources/zip-files.html" target="_blank"&gt;link&lt;/A&gt;  &lt;/P&gt;</description>
      <pubDate>Mon, 22 Nov 2021 22:45:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11862#M6777</guid>
      <dc:creator>jose_gonzalez</dc:creator>
      <dc:date>2021-11-22T22:45:46Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11863#M6778</link>
      <description>&lt;P&gt;My file is 180MiB. For information, the culster is a single node standard_F4s&lt;/P&gt;</description>
      <pubDate>Fri, 26 Nov 2021 18:09:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11863#M6778</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-26T18:09:49Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11864#M6779</link>
      <description>&lt;P&gt;I would like my code to be fully wwritten in python if possible.&lt;/P&gt;</description>
      <pubDate>Fri, 26 Nov 2021 18:26:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11864#M6779</guid>
      <dc:creator>RantoB</dc:creator>
      <dc:date>2021-11-26T18:26:21Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11865#M6780</link>
      <description>&lt;P&gt;Could you please try without community edition, there must be some restriction for %sh&lt;/P&gt;</description>
      <pubDate>Tue, 30 Nov 2021 06:54:43 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11865#M6780</guid>
      <dc:creator>Atanu</dc:creator>
      <dc:date>2021-11-30T06:54:43Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11866#M6781</link>
      <description>&lt;UL&gt;&lt;LI&gt;Could you please try without community edition, there must be some restriction for %sh&lt;/LI&gt;&lt;LI&gt;&amp;nbsp;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 30 Nov 2021 06:54:54 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11866#M6781</guid>
      <dc:creator>Atanu</dc:creator>
      <dc:date>2021-11-30T06:54:54Z</dc:date>
    </item>
    <item>
      <title>Re: unzip twice the same file not executing</title>
      <link>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11867#M6782</link>
      <description>&lt;P&gt;Another problem is that dbfs storage doesn't support random writes (used by zip):&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;Does not support random writes. For workloads that require random writes, perform the operations on local disk first and then copy the result to&lt;/I&gt;&lt;/P&gt;&lt;P&gt;&lt;I&gt;/dbfs&lt;/I&gt;&lt;/P&gt;&lt;P&gt;source: &lt;A href="https://docs.databricks.com/data/databricks-file-system.html#local-file-api-limitations" target="test_blank"&gt;https://docs.databricks.com/data/databricks-file-system.html#local-file-api-limitations&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 15 Dec 2021 13:39:44 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/unzip-twice-the-same-file-not-executing/m-p/11867#M6782</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2021-12-15T13:39:44Z</dc:date>
    </item>
  </channel>
</rss>

