cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Databricks unable to list ADLS folder and files

CG29
New Contributor

Hi Databricks Community,

I am able to list the container from my databricks workspace but unable to list the folder and files further.
If I try to access the same files and folder from the Databricks UI, external location path, I am able to see all files and folder.
I am getting output for below query.
dbutils.fs.ls("abfss://***@*****.dfs.core.windows.net/")

but if I run the below dbutils query, it runs for indefinite time without any output.
dbutils.fs.ls("abfss://***@*****.dfs.core.windows.net/foldername")

Please help me identify what I am missing here. Unable to read any files from my ADLS Gen2 Storage. Thanks !!

1 ACCEPTED SOLUTION

Accepted Solutions

ShamenParis
New Contributor II

Hi there!

Assuming your permissions are fully set up correctly, an indefinite hang (rather than a quick "403 Forbidden" error) usually points to either a timeout or a dropped network connection.

Here are three common culprits you might want to check out:

  • The folder has too many files: The Databricks UI is smart and uses pagination (it only loads the first 50–100 files at a time), which makes it load instantly. However, dbutils.fs.ls() is synchronous and tries to load the entire list of files into your cluster's memory all at once. If there are thousands of files in there, it will just hang. Try testing the command on a much smaller subfolder.

  • The missing trailing slash: Azure can be very picky about paths. If you run it without a slash at the end (.../foldername), Azure does a massive prefix scan just to check if "foldername" is a single file first. Try running it exactly like this with the slash at the end: .../foldername/ and see if it runs normally.

  • A hidden firewall/networking rule: The UI and your notebook actually run on different networks. The UI uses the Databricks Control Plane, while your notebook uses your cluster's specific VNet (the Data Plane). Your ADLS account might be set to "Allow Azure Services" (which lets the UI work), but a firewall or Network Security Group might be silently dropping the packets from your cluster's VNet, causing it to just hang while waiting for a response.

Hope one of these points you in the right direction!

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

View solution in original post

5 REPLIES 5

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @CG29,

I suspect the main thing missing here is that these two access paths aren't always using the same permission model. When you browse the path from the Databricks UI as an external location, that access is typically governed through Unity Catalog using an external location plus its storage credential, whereas a direct dbutils.fs.ls("abfss://...") call is raw path-based access to cloud storage and therefore depends on the permissions available for that direct URI access path. Databricks calls out both patterns in the public docs and generally recommends using Unity Catalog-managed access where possible rather than relying on raw cloud URIs for day-to-day access. See Work with files on Azure Databricks, Connect to cloud object storage using Unity Catalog, and Connect to an Azure Data Lake Storage Gen2 external location.

So if the container root lists successfully but abfss://.../foldername hangs, the most likely explanation is that the identity used for the direct abfss call can see the container itself but lacks sufficient ADLS Gen2 permissions on that subdirectory or one of its parent paths. In ADLS Gen2, listing a directory requires read and execute permissions on that directory, and traversing into nested paths requires execute permissions on the parent directories as well. Microsoft documents that behaviour in the ACL guidance here: Access control lists in Azure Data Lake Storage.

In other words, being able to see the location in the Databricks UI does not necessarily prove that the same notebook call over abfss:// has the required permissions end-to-end. I would first verify which identity is actually being used by the cluster for direct ADLS access, and then check the ACLs on foldername and its parent directories to make sure that identity has the required traverse and list permissions. If ACLs were only set at a higher level, it is also worth checking whether they were ever propagated recursively to existing child folders and files. Microsoft’s ADLS ACL docs cover that as well in the CLI guidance for recursive ACL updates.

It is also worth trying the path with a trailing slash, since Databricks examples for directory listings use that form, for example dbutils.fs.ls("abfss://container@account.dfs.core.windows.net/path/"). That probably isn’t the root cause here, but it is a quick sanity check.

Hope this helps.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

ShamenParis
New Contributor II

Hi there!

Assuming your permissions are fully set up correctly, an indefinite hang (rather than a quick "403 Forbidden" error) usually points to either a timeout or a dropped network connection.

Here are three common culprits you might want to check out:

  • The folder has too many files: The Databricks UI is smart and uses pagination (it only loads the first 50–100 files at a time), which makes it load instantly. However, dbutils.fs.ls() is synchronous and tries to load the entire list of files into your cluster's memory all at once. If there are thousands of files in there, it will just hang. Try testing the command on a much smaller subfolder.

  • The missing trailing slash: Azure can be very picky about paths. If you run it without a slash at the end (.../foldername), Azure does a massive prefix scan just to check if "foldername" is a single file first. Try running it exactly like this with the slash at the end: .../foldername/ and see if it runs normally.

  • A hidden firewall/networking rule: The UI and your notebook actually run on different networks. The UI uses the Databricks Control Plane, while your notebook uses your cluster's specific VNet (the Data Plane). Your ADLS account might be set to "Allow Azure Services" (which lets the UI work), but a firewall or Network Security Group might be silently dropping the packets from your cluster's VNet, causing it to just hang while waiting for a response.

Hope one of these points you in the right direction!

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Thanks!! Shamen Paris
The first two point has ruled out as the folder structure was not that complex and having very minimal data. Also, I did add the trailing slash in the query, it was a typo miss in my question.

I was confused why Iam able to browser all files from UI and not from the cluster.
So, third point do hint out on the cluster network, I checked the connectivity between the cluster network and PEP of storage account both are hosted in different Vnet. So, enabling the vnet Peering resolved the issue. Thankyou !!

ShamenParis
New Contributor II

 @CG29 , I'm glad that helped

ashukasma
New Contributor II

Following are may be the Causes
1. Different authentication methods
- The UI's external location uses Unity Catalog credentials
- Your dbutils.fs.ls() command uses the compute's Spark configurations
- These may be using different credentials with different permissions


2. Missing Spark configurations
- Your compute might not have the necessary ADLS Gen2 authentication configs

3. Credential permissions
- The credential might have container-level LIST but not deeper TRAVERSE/READ permissions

 

How to Diagnose

1. Check compute type

  • Are you using server less, shared, or single-user compute?
  • Unity Catalog-enabled clusters handle credentials differently

Solutions
Option 1: Use Unity Catalog Volumes

Option 2: Add Spark configurations to your compute Add these to your cluster's Spark Config (Cluster → Configuration → Advanced Options)

Aashish Kasma | CTO & Cofounder, Lucent Innovation