Databricks Community

hanifmusa · 3 weeks ago

I am exporting parquet files (partitioned by id) in append mode. However, I encounter errors occasionally, while other times the job completes successfully.

Apache Spark Exception: Exception thrown in awaitResult: hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: The specified blob does not exist.

Currently, the storage access is configured as follows:`wasbs://<container-name>@<storage-account-name>.blob.core.windows.net/<directory-name>`

For exporting using append mode. anyone can help?

balajij8 · 3 weeks ago

Its generally due to race conditions when Spark checks for existing partition files before writing combined with Azure Blob Storage's eventual consistency mode.

You can follow below

1. Switch to Delta Lake - You can use Delta Lake format instead of Parquet with append mode. Delta handles concurrency and append operations reliably.

2. Use ABFS/ABFSS Protocol in Azure Data Lake Storage - Switch from wasbs:// to abfss:// as it has better consistency guarantees. It requires your storage account to have hierarchical namespace enabled (ADLS Gen2). Enable it and use it for good results. Use Unity Catalog volumes if feasible.

spark.read.load("abfss://container@storageaccount.dfs.core.windows.net/data_path")

More details here

3. Use Overwrite with Partition Mode if append semantics aren't strictly required per partition.

4. Add Retry Logic - Wrap the write operation with retry logic to handle transient Azure Storage errors.

5. Check Storage Configuration - Ensure you are using the latest Hadoop Azure connector version and that the storage account has optimal consistency settings

You can use Delta Lake as it's ACID-compliant, handles concurrent writes safely and is the best for production workloads on Databricks.

Databricks Community

Getting error hadoop_azure_shaded.com.microsoft.azure.storage.StorageException: The specified blob d

You can follow below

Introducing the Genie Hub: Ask Questions, Share Builds, and Master Conversational Analytics

🌟 Community Pulse: Your Weekly Roundup! July 13 – 19, 2026

Solution Accelerator Series | Social Determinants of Health

Upcoming Community BrickTalk | Sports Analytics: Turning Tracking Data into Real-Time AI Decisions

How to Optimize Your Content for GEO: Best Practices for Writing Discoverable Community Content