Spark Out of Memory Error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-17-2024 11:40 PM
Background
Using R language's {sparklyr} package to fetch data from tables in Unity Catalog, and faced the error below.
Tried the following, to no avail:
- Using memory optimized cluster - e.g., E4d.
- Using bigger (RAM) cluster - e.g., E8d.
- Enable auto-scaling.
- Setting spark config:
- spark.driver.maxResultSize 4096
- spark.memory.offHeap.enabled true
- spark.driver.memory 8082
- spark.executor.instances 4
- spark.memory.offHeap.size 7284
- spark.executor.memory 7284
- spark.executor.cores 4
Error
- Labels:
-
Workflows
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-22-2024 08:23 AM - edited 07-22-2024 08:27 AM
@Retired_mod , thanks for the detailed suggestions.
I believe the first reference relates to the issue; however, after adjusting spark.driver.maxResultSize to various values - e.g., 10g, 20g, 30g - a new error ensues (see below).
The operation involves a collect() on a Delta table with 380 MM rows and 5 columns (3.2GB, partitioned into 55 files). If the average row size is 48Bytes (per initial error), shouldn't 20GBytes be sufficient?
New Error