cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Frequent “GetPathStatus” and “GetBlobProperties” PathNotFound Errors on Azure Storage in Databricks

h_h_ak
Contributor

We are encountering frequent GetPathStatus and GetBlobProperties errors when trying to access Azure Data Lake Storage (ADLS) paths through our Databricks environment. The errors consistently return a 404 PathNotFound status for paths that should be accessible.

Context:

  • Operation: df.write() and df.read() operations on Databricks, attempting to access Azure storage paths.
  • Storage Path: /stxxxx/src-sapecc/ and other related paths in Azure Data Lake Gen2.
  • Errors Observed:
  • GetPathStatus: PathNotFound
  • GetBlobProperties: BlobNotFound

Error Count: The errors are recurring frequently, as seen in the attached logs, which indicate multiple instances of the PathNotFound error with status code 404.

Timestamps: Errors occur across multiple timestamps (see attached logs for details).

Attached Screenshot: Logs showing details of the error, including the operation name, status codes, and paths.

Could you please assist in identifying why these PathNotFound and BlobNotFound errors are occurring despite correct configuration and permissions? Additionally, if there’s any further configuration required on the Azure or Databricks side to resolve this, please advise.Thanks in advance..

 

h_h_ak_0-1729867508203.png

 

5 REPLIES 5

saurabh18cs
Contributor II

Hi,

1) Ensure that the paths you are trying to access are correct and exist in the ADLS Gen2 storage account.

2)  Verify that the Databricks cluster has the necessary permissions to access the ADLS Gen2 paths

Br

 

h_h_ak
Contributor

Hi,

I confirm that I have checked it, and everything seems to be in order—both path and permissions are definitely in place, as we are also successfully writing data to the container. I noticed that these messages come up in several situations:

1.In SQL Warehouse queries

2. During spark.read... operations

3. During spark.write... operations

We are using DBR 13.3 in this workspace. Any ideas on why so many storage-related messages are appearing? It only started happening after we enabled diagnostic settings in Azure.

aashish122
New Contributor III

Testing this in incognito mode will help !!!

Why do you think this will help? We have Spark and connection configuration in the cluster settings, and spark.read... or write statements are executed by the notebook. Additionally, the SQL queries are coming from outside. How would Incognito help in this scenario?

h_h_ak
Contributor

Adding answer from MSFT Support Team:

Why is there _delta_log being checked when the function used is parquet.
The _delta_log directory is being checked because the system is designed to scan directories and their parent directories to look for a Delta log folder. This is done to ensure that if a user is writing to a Delta table using the wrong format (e.g., using Parquet instead of Delta), the system can identify the mistake and fail the job to prevent data corruption

Why are all parent folders getting _deltalogs call?
The system recursively checks all parent directories for the _delta_log folder to determine if any of the parent directories are Delta tables. This is part of the design to ensure that the correct table format is being used and to avoid potential issues with data integrity.

What are the files _encryption_metadata/manifest.json and _spark_metadata being referenced for given that is not present in the folders.
How to remove this requests?
The _encryption_metadata/manifest.json file is being checked to determine if encryption is enabled on the storage.
The _spark_metadata directory is typically created by streaming jobs to store metadata about the stream. Even though these files may not be present in the folders, the system checks for them as part of its standard operations.

How to remove these requests?
Currently, there is no direct way to remove these requests as they are part of the system's design to ensure data integrity and correct table format usage.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group