Re: Large datasets in Databricks - Databricks Community - 100516

Register to join the community

Generative AI

Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.

How can I efficiently handle large datasets in Databricks when performing group-by operations to avoid out-of-memory errors? Are there any best practices or optimizations for improving performance, such as partitioning or caching, especially when working with Spark DataFrames?

srd sassa change phone number

7 brew

1 REPLY 1

I believe this article might help answer your question.

Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads

--------------------------
Takuya Omi (尾美拓哉)

never-displayed

You must be signed in to add attachments

never-displayed

Announcements

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

The next BrickTalks about the latest and greatest in AI/BI is scheduled for Oct 28!

🚀 Weekly Delta (8 - 14 October): A Look Back at This Week’s Top Community Highlights

BrickCon 2025 — Dec 3–5 | A Community Conference for Databricks Builders

🌟 Community Sparks of the Week | September 26 – October 2 🌟