<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: FileNotFoundError when using sftp to write to disk within jobs in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32632#M23780</link>
    <description>&lt;P&gt;SFTP connection is done with a password&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def connect_to_sftp(host: str, port: (str,int), username: str, password:str) -&amp;gt; paramiko.sftp_client.SFTPClient:
    stfp, transport = None, None
    try:
        transport = paramiko.Transport(host, port)
        transport.connect(username=username, password=password)
    except Exception as e:
        print(e)
        if transport is not None:
            transport.close()
    try:
        sftp = paramiko.SFTPClient.from_transport(transport)
        return sftp
    except Exception as e:
        print(e)
        if sftp is not None:
            sftp.close()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;sftp_object is the name of the remote object:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;for file in sftp.listdir(sftp_dir):
    sftp_object = f'{sftp_dir}/{file}'
    if dryrun:
        print(sftp_object)
    sftp_read(sftp_object, prefix)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Sun, 04 Sep 2022 06:40:23 GMT</pubDate>
    <dc:creator>akdm</dc:creator>
    <dc:date>2022-09-04T06:40:23Z</dc:date>
    <item>
      <title>FileNotFoundError when using sftp to write to disk within jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32630#M23778</link>
      <description>&lt;P&gt;When I try to convert a notebook into a job I frequently run into an issue with writing to the local filesystem. For this particular example, I did all my notebook testing with a bytestream for small files. When I tried to run as a job, I used the method I had to save the download to disk but I keep getting a `FileNotFoundError`. An example code snippet is below with two methods I've tried:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;# method 1
def sftp_read(sftp_object, prefix):
    key = f'{prefix}/{sftp_object}'
    if not os.path.exists('/local_disk0/tmp/'):
        os.makedirs('/local_disk0/tmp/')
    sftp.get(sftp_object, f'/local_disk0/tmp/{sftp_object}')
    # do stuff
    os.remove(f'/local_disk0/tmp/{sftp_object}')
&amp;nbsp;
# method 2
def sftp_read(sftp_object, prefix):
    key = f'{prefix}/{sftp_object}'
    dbutils.fs.mkdirs('file:/tmp/')
    sftp.get(sftp_object, f'file:/tmp/{sftp_object}')
    # do stuff
    dbutils.fs.rm(f'file:/tmp/{sftp_object}')
        
&amp;nbsp;
FileNotFoundError                         Traceback (most recent call last)
&amp;lt;command-3394785040378964&amp;gt; in &amp;lt;cell line: 1&amp;gt;()
      3     if dryrun:
      4         print(sftp_object)
----&amp;gt; 5     sftp_read(sftp_object, prefix)
&amp;nbsp;
&amp;lt;command-3394785040378909&amp;gt; in sftp_read(sftp_object, prefix)
     57             if not os.path.exists('/local_disk0/tmp/'):
     58                 os.makedirs('/local_disk0/tmp/')
---&amp;gt; 59             sftp.get(sftp_object, f'/local_disk0/tmp/{sftp_object}')
     60             # do stuff
     61             os.remove(f'/local_disk0/tmp/{sftp_object}')
&amp;nbsp;
/local_disk0/.ephemeral_nfs/envs/pythonEnv-1e9ce7e1-d7d5-4473-b8d6-dbe59be12302/lib/python3.9/site-packages/paramiko/sftp_client.py in get(self, remotepath, localpath, callback, prefetch)
    808             Added the ``prefetch`` keyword argument.
    809         """
--&amp;gt; 810         with open(localpath, "wb") as fl:
    811             size = self.getfo(remotepath, fl, callback, prefetch)
    812         s = os.stat(localpath)
&amp;nbsp;
FileNotFoundError: [Errno 2] No such file or directory: '/local_disk0/tmp/path/to/file.ext'&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt; I have referenced the DFBS local files documentation as well: &lt;A href="https://docs.databricks.com/files/index.html" target="test_blank"&gt;https://docs.databricks.com/files/index.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any suggestions or something I need to know about Jobs running in a different manner than notebooks?&lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2022 15:20:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32630#M23778</guid>
      <dc:creator>akdm</dc:creator>
      <dc:date>2022-09-02T15:20:57Z</dc:date>
    </item>
    <item>
      <title>Re: FileNotFoundError when using sftp to write to disk within jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32632#M23780</link>
      <description>&lt;P&gt;SFTP connection is done with a password&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def connect_to_sftp(host: str, port: (str,int), username: str, password:str) -&amp;gt; paramiko.sftp_client.SFTPClient:
    stfp, transport = None, None
    try:
        transport = paramiko.Transport(host, port)
        transport.connect(username=username, password=password)
    except Exception as e:
        print(e)
        if transport is not None:
            transport.close()
    try:
        sftp = paramiko.SFTPClient.from_transport(transport)
        return sftp
    except Exception as e:
        print(e)
        if sftp is not None:
            sftp.close()&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;sftp_object is the name of the remote object:&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;for file in sftp.listdir(sftp_dir):
    sftp_object = f'{sftp_dir}/{file}'
    if dryrun:
        print(sftp_object)
    sftp_read(sftp_object, prefix)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 04 Sep 2022 06:40:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32632#M23780</guid>
      <dc:creator>akdm</dc:creator>
      <dc:date>2022-09-04T06:40:23Z</dc:date>
    </item>
    <item>
      <title>Re: FileNotFoundError when using sftp to write to disk within jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32633#M23781</link>
      <description>&lt;P&gt;I was able to fix it. It was an issue with the nested files on the SFTP. I had to ensure that the parent folders were being created as well. Splitting out the local path and file made it easier to ensure that it existed with os.path.exists() and os.makedirs()&lt;/P&gt;&lt;PRE&gt;&lt;CODE&gt;def sftp_read(sftp_object, bucket, prefix):
    key = f'{prefix}/{sftp_object}'
    local_path = '/local_disk0/tmp'
    local_file = f'{local_path}/{os.path.basename(sftp_object)}'
    if not os.path.exists(local_path):
            os.makedirs(local_path)
    sftp.get(sftp_object, local_file)
     # do stuff
     os.remove(local_file)&lt;/CODE&gt;&lt;/PRE&gt;&lt;P&gt;All in all, not a databricks issue, just an issue that appeared on databricks.&lt;/P&gt;</description>
      <pubDate>Wed, 07 Sep 2022 16:10:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32633#M23781</guid>
      <dc:creator>akdm</dc:creator>
      <dc:date>2022-09-07T16:10:42Z</dc:date>
    </item>
    <item>
      <title>Re: FileNotFoundError when using sftp to write to disk within jobs</title>
      <link>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32631#M23779</link>
      <description>&lt;P&gt;Hi, Thanks for reaching out to community.databricks.com.&lt;/P&gt;&lt;P&gt;Could you please also mention where did you declare sftp_object? Also, how did you set the connection to SFTP? Was it with password or password less? &lt;/P&gt;</description>
      <pubDate>Fri, 02 Sep 2022 20:25:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/filenotfounderror-when-using-sftp-to-write-to-disk-within-jobs/m-p/32631#M23779</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2022-09-02T20:25:21Z</dc:date>
    </item>
  </channel>
</rss>

