cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Unable to Analyze External Delta tables due to failed to initialize filesystem

RobCox
New Contributor

Hello,

I've recently noticed we've never been using Analyze Table, after doing z-ordering / liquid clustering investigations and noticing the query plans for our delta tables were not considering these paths.

I'm trying to execute the following command to trigger statistics for our delta tables

 

spark.sql(f"ANALYZE TABLE delta.my_table_path COMPUTE DELTA STATISTICS") (my_table_path is backticked)

 

my_table_path is an abfss path, we are not using unity catalogue currently.

The error being received is

 

WARN FileSystem: Failed to initialize filesystem my_table_path: Failure to initialize configuration for storage account XXXXXXXX.dfs.core.windows.net: Invalid configuration value detected for fs.azure.account.keyInvalid configuration value detected for fs.azure.account.key

 

However, we can successfully run commands against this table path such as

 

spark.sql(f"DESCRIBE DETAIL delta.my_table_path").show()

 

In addition to this reading/writing/doing optimize are also all working, and I was able to deep clone the source data to this location in order to do all this testing.

Does anyone know what might be at play here? Does Analyze use some elevated permissions on the blob storage that we're running into for example?

In addition to this, I believe running the Analyze command is key to not seeing our execution plans be optimized to use z-ordering or liquid clustering, is this a correct assumption? Currently the execution plan ignores all of these despite doing optimize operations.

Thanks in advance if you're able to look at this!

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @RobCox, This might be due to incorrect configuration settings or insufficient permissions. Ensure that the fs.azure.account.key configuration is accurate and that the service principal or identity running the command has the necessary permissions. The ANALYZE TABLE command is essential for optimizing query plans by collecting statistics, which helps the optimizer effectively use features like Z-Ordering and Liquid Clustering. 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group