Re: Large datasets in Databricks - Databricks Community - 100516

Register to join the community

Generative AI

Explore discussions on generative artificial intelligence techniques and applications within the Databricks Community. Share ideas, challenges, and breakthroughs in this cutting-edge field.

How can I efficiently handle large datasets in Databricks when performing group-by operations to avoid out-of-memory errors? Are there any best practices or optimizations for improving performance, such as partitioning or caching, especially when working with Spark DataFrames?

srd sassa change phone number

7 brew

1 REPLY 1

I believe this article might help answer your question.

Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads

--------------------------
Takuya Omi (尾美拓哉)

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.

You must be signed in to add attachments

never-displayed

Announcements

Business Intelligence in the Era of AI

🚀 Monthly Databricks Get Started Days – Accelerate Your Learning Journey! 🚀

Databricks Community Champion - March 2025 - Takuya Omi

Intelligent Data Warehousing: AI/BI for Self-service Analytics

Get Started With Lakehouse Architecture | Pass a quiz to earn your certificate completion.

upload