<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Listing all files under an Azure Data Lake Gen2 container in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28041#M19879</link>
    <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Here's one that might help:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;def deep_ls(path: str):
    """List all files in base path recursively."""
    for x in dbutils.fs.ls(path):
        if x.path[-1] is not '/':
            yield x
        else:
            for y in deep_ls(x.path):
                yield y
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Usage:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;A href="https://gist.github.com/Menziess/bfcbea6a309e0990e8c296ce23125059" target="test_blank"&gt;https://gist.github.com/Menziess/bfcbea6a309e0990e8c296ce23125059&lt;/A&gt; 
&lt;P&gt;&lt;/P&gt;</description>
    <pubDate>Thu, 27 Feb 2020 17:43:12 GMT</pubDate>
    <dc:creator>StefanSchenk</dc:creator>
    <dc:date>2020-02-27T17:43:12Z</dc:date>
    <item>
      <title>Listing all files under an Azure Data Lake Gen2 container</title>
      <link>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28037#M19875</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I am trying to find a way to list all files in an Azure Data Lake Gen2 container. I have mounted the storage account and can see the list of files in a folder (a container can have multiple level of folder hierarchies) if I know the exact path of the file. But I want something to list all files under all folders and subfolders in a given container. dbutils.fs.ls doesn't have any recursive list function nor does it support any wildcards in the file path. How can I achieve this?&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 02 Jun 2019 11:22:04 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28037#M19875</guid>
      <dc:creator>AmitSukralia</dc:creator>
      <dc:date>2019-06-02T11:22:04Z</dc:date>
    </item>
    <item>
      <title>Re: Listing all files under an Azure Data Lake Gen2 container</title>
      <link>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28038#M19876</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Use REST API? &lt;/P&gt;&lt;P&gt;&lt;/P&gt;Example here in Powershell: &lt;A href="http://dreich.net/using-powershell-to-list-azure-datalake-gen2-contents" target="test_blank"&gt;http://dreich.net/using-powershell-to-list-azure-datalake-gen2-contents&lt;/A&gt;&lt;P&gt;&lt;/P&gt;Only authentication available to do this is with Access Keys. 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 18 Sep 2019 10:04:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28038#M19876</guid>
      <dc:creator>dreich</dc:creator>
      <dc:date>2019-09-18T10:04:08Z</dc:date>
    </item>
    <item>
      <title>Re: Listing all files under an Azure Data Lake Gen2 container</title>
      <link>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28039#M19877</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;I wrote a custom function to get all the required files. The function considers the ADL container- root of a tree, performs "ls" on the root, performs a "ls" on its children recursively and returns with Leaf nodes (which are the required files).&lt;/P&gt;
&lt;P&gt;The base condition for the recursive function would be to check if the current node's path ends with a "/". All leaf nodes in the document structure do not have a "/" in their path.&lt;/P&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 16 Dec 2019 05:20:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28039#M19877</guid>
      <dc:creator>ankitha</dc:creator>
      <dc:date>2019-12-16T05:20:57Z</dc:date>
    </item>
    <item>
      <title>Re: Listing all files under an Azure Data Lake Gen2 container</title>
      <link>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28040#M19878</link>
      <description>&lt;P&gt;you can create recursive function in python inside the databricks.&lt;/P&gt;&lt;P&gt;something like this.&lt;/P&gt;&lt;P&gt;def filedetails(path): &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; lists = dbutils.fs.ls(path) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; global num &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; for i in lists: &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; if (i[1][-1] == "/"): &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; num += 1 &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; lenfiles = &lt;A href="http://dbutils.fs.ls" alt="http://dbutils.fs.ls" target="_blank"&gt;dbutils.fs.ls&lt;/A&gt;(i[0]) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; modifiedlist.append((i[0],i[1],i[2],len(lenfiles))) &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt; filedetails(i[0])&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2020 16:00:10 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28040#M19878</guid>
      <dc:creator>JithuBalan</dc:creator>
      <dc:date>2020-02-27T16:00:10Z</dc:date>
    </item>
    <item>
      <title>Re: Listing all files under an Azure Data Lake Gen2 container</title>
      <link>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28041#M19879</link>
      <description>&lt;P&gt;&lt;/P&gt;
&lt;P&gt;Here's one that might help:&lt;/P&gt;
&lt;PRE&gt;&lt;CODE&gt;def deep_ls(path: str):
    """List all files in base path recursively."""
    for x in dbutils.fs.ls(path):
        if x.path[-1] is not '/':
            yield x
        else:
            for y in deep_ls(x.path):
                yield y
&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;Usage:&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;A href="https://gist.github.com/Menziess/bfcbea6a309e0990e8c296ce23125059" target="test_blank"&gt;https://gist.github.com/Menziess/bfcbea6a309e0990e8c296ce23125059&lt;/A&gt; 
&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Feb 2020 17:43:12 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28041#M19879</guid>
      <dc:creator>StefanSchenk</dc:creator>
      <dc:date>2020-02-27T17:43:12Z</dc:date>
    </item>
    <item>
      <title>Re: Listing all files under an Azure Data Lake Gen2 container</title>
      <link>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28042#M19880</link>
      <description>&lt;P&gt;&lt;A href="https://community.databricks.com/s/contentdocument/0693f000007PPdGAAW" alt="https://community.databricks.com/s/contentdocument/0693f000007PPdGAAW" target="_blank"&gt;stackoverflow.png&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.databricks.com/s/contentdocument/0693f000007PPdLAAW" alt="https://community.databricks.com/s/contentdocument/0693f000007PPdLAAW" target="_blank"&gt;files.txt&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Sun, 22 Mar 2020 17:04:37 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/listing-all-files-under-an-azure-data-lake-gen2-container/m-p/28042#M19880</guid>
      <dc:creator>Balaji_su</dc:creator>
      <dc:date>2020-03-22T17:04:37Z</dc:date>
    </item>
  </channel>
</rss>

