Databricks Community

nathan45shafer · ‎01-17-2025

Hello everyone,

I’m encountering an issue when querying large Parquet files in Databricks, particularly with files exceeding 1 GB in size. The queries are running extremely slow, and at times, they even time out. I’ve tried optimizing the file size and partitioning strategy, but the problem persists.

Has anyone faced a similar issue or have any insights on optimizing performance for large Parquet files in Databricks?

Thanks in advance.

SSM Smart Square Com

Alberto_Umana · ‎01-17-2025

Hello @nathan45shafer,

Thanks for your question, you can refer to: https://www.databricks.com/discover/pages/optimize-data-workloads-guide it covers good practices and actions to optimize your workflow, please let me know if you have questions.

Databricks Community

queries are running extremely slow

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐