Large datasets in Databricks
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-30-2024 09:30 PM
How can I efficiently handle large datasets in Databricks when performing group-by operations to avoid out-of-memory errors? Are there any best practices or optimizations for improving performance, such as partitioning or caching, especially when working with Spark DataFrames?
1 REPLY 1
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-02-2024 06:01 AM
Hi, @maltasa
I believe this article might help answer your question.
Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads
--------------------------
Takuya Omi (尾美拓哉)
Takuya Omi (尾美拓哉)

