queries are running extremely slow

nathan45shafer
New Contributor

Hello everyone,

I’m encountering an issue when querying large Parquet files in Databricks, particularly with files exceeding 1 GB in size. The queries are running extremely slow, and at times, they even time out. I’ve tried optimizing the file size and partitioning strategy, but the problem persists.

Has anyone faced a similar issue or have any insights on optimizing performance for large Parquet files in Databricks?

Thanks in advance.

Alberto_Umana
Databricks Employee
Databricks Employee

Hello @nathan45shafer,

Thanks for your question, you can refer to: https://www.databricks.com/discover/pages/optimize-data-workloads-guide it covers good practices and actions to optimize your workflow, please let me know if you have questions.