<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Where does the files downloaded from wget get stored in Databricks? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/where-does-the-files-downloaded-from-wget-get-stored-in/m-p/32337#M23562</link>
    <description>&lt;P&gt;Hey Team!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;All I'm trying is to download a csv file stored on S3 and read it using Spark.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's what I mean:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;!wget &lt;A href="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv" target="test_blank"&gt;https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv&lt;/A&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;If i download this "yellow_tripdata_2020-01.csv" where exactly it would be stored?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The response to wget is as below:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;--2022-01-04 12:38:48--  &lt;A href="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv" target="test_blank"&gt;https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv&lt;/A&gt;
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.193.8
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.193.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 593610736 (566M) [text/csv]
Saving to: ‘yellow_tripdata_2020-01.csv’
&amp;nbsp;
yellow_tripdata_202 100%[===================&amp;gt;] 566.11M  14.9MB/s    in 42s     
&amp;nbsp;
2022-01-04 12:39:31 (13.5 MB/s) - ‘yellow_tripdata_2020-01.csv’ saved [593610736/593610736]&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Any help would be appreciated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Tagging&lt;/P&gt;&lt;P&gt;@Kaniz Fatma​&amp;nbsp;, @Harikrishnan Kunhumveettil​&amp;nbsp; for better reach.&lt;/P&gt;</description>
    <pubDate>Tue, 04 Jan 2022 13:26:00 GMT</pubDate>
    <dc:creator>RiyazAliM</dc:creator>
    <dc:date>2022-01-04T13:26:00Z</dc:date>
    <item>
      <title>Where does the files downloaded from wget get stored in Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/where-does-the-files-downloaded-from-wget-get-stored-in/m-p/32337#M23562</link>
      <description>&lt;P&gt;Hey Team!&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;All I'm trying is to download a csv file stored on S3 and read it using Spark.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Here's what I mean:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;!wget &lt;A href="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv" target="test_blank"&gt;https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv&lt;/A&gt;&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;If i download this "yellow_tripdata_2020-01.csv" where exactly it would be stored?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The response to wget is as below:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;--2022-01-04 12:38:48--  &lt;A href="https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv" target="test_blank"&gt;https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2020-01.csv&lt;/A&gt;
Resolving s3.amazonaws.com (s3.amazonaws.com)... 54.231.193.8
Connecting to s3.amazonaws.com (s3.amazonaws.com)|54.231.193.8|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 593610736 (566M) [text/csv]
Saving to: ‘yellow_tripdata_2020-01.csv’
&amp;nbsp;
yellow_tripdata_202 100%[===================&amp;gt;] 566.11M  14.9MB/s    in 42s     
&amp;nbsp;
2022-01-04 12:39:31 (13.5 MB/s) - ‘yellow_tripdata_2020-01.csv’ saved [593610736/593610736]&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Any help would be appreciated.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Tagging&lt;/P&gt;&lt;P&gt;@Kaniz Fatma​&amp;nbsp;, @Harikrishnan Kunhumveettil​&amp;nbsp; for better reach.&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jan 2022 13:26:00 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/where-does-the-files-downloaded-from-wget-get-stored-in/m-p/32337#M23562</guid>
      <dc:creator>RiyazAliM</dc:creator>
      <dc:date>2022-01-04T13:26:00Z</dc:date>
    </item>
    <item>
      <title>Re: Where does the files downloaded from wget get stored in Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/where-does-the-files-downloaded-from-wget-get-stored-in/m-p/32338#M23563</link>
      <description>&lt;P&gt;I would prefer to use python requests library to have total control and save it to dbfs storage.&lt;/P&gt;&lt;P&gt;If you run wget you can run with magic command in notebook cell:&lt;/P&gt;&lt;P&gt;%sh&lt;/P&gt;&lt;P&gt;wget...&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;so you can check current directory with&lt;/P&gt;&lt;P&gt;%sh&lt;/P&gt;&lt;P&gt;pwd&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;regarding wget it is also possible to specify output file &lt;A href="https://linux.die.net/man/1/wget" target="test_blank"&gt;https://linux.die.net/man/1/wget&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 04 Jan 2022 15:21:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/where-does-the-files-downloaded-from-wget-get-stored-in/m-p/32338#M23563</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-01-04T15:21:33Z</dc:date>
    </item>
    <item>
      <title>Re: Where does the files downloaded from wget get stored in Databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/where-does-the-files-downloaded-from-wget-get-stored-in/m-p/32340#M23565</link>
      <description>&lt;P&gt;Hi @Kaniz Fatma​&amp;nbsp;, thanks for the remainder.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Hey @Hubert Dudek​&amp;nbsp;- thank you very much for your prompt response.&lt;/P&gt;&lt;P&gt;Initially, I was using urllib3 to 'GET' the data residing in the URL. So, I wanted an alternative for the same. Unfortunately, requests library does the same thing as urllib3.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;The question I had was if I use the wget command, where does the downloaded data gets stored ?&lt;/P&gt;&lt;P&gt;I understood that it would be saved in the &lt;B&gt;driver's memory&lt;/B&gt;.  &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In my case :&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;'/databricks/driver'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Once, I figured that out, as Hubert suggested, I saved the data in DBFS.&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;dbutils.fs.cp('file:/databricks/driver/yellow_tripdata_2020-01.csv', 'dbfs:/FileStore/tables/')&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;Thank y'all for the quick turn around.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 11 Jan 2022 06:56:13 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/where-does-the-files-downloaded-from-wget-get-stored-in/m-p/32340#M23565</guid>
      <dc:creator>RiyazAliM</dc:creator>
      <dc:date>2022-01-11T06:56:13Z</dc:date>
    </item>
  </channel>
</rss>

