cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

How do I know if the number of files are causing performance issues?

User16826992666
Valued Contributor

I have read and heard that having too many small files can cause performance problems when reading large data sets. But how do I know if that is an issue I am facing?

1 REPLY 1

sajith_appukutt
Honored Contributor II

Databricks SQL endpoint has a query history section which provides additional information to debug / tune queries. One such metric under execution details is the number of files read.query-metrics 

For ETL/Data science workloads, you could use the Spark UI of the cluster and click on "Query Details" to get this info.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.