SparkOutOfMemoryError when merging data into a tab...

vannipart · ‎07-17-2024

Hello,

There is an issue with merging data from a dataframe into a table

2024 databricksJob aborted due to stage failure: Task 17 in stage 1770.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1770.0 (TID 1669) (1x.xx.xx.xx executor 8): org.apache.spark.memory.SparkOutOfMemoryError: [UNABLE_TO_ACQUIRE_MEMORY] Unable to acquire 28 bytes of memory, got 0.

There script:

df.createOrReplaceTempView("df_re")

%sql

MERGE INTO catalog.schema.table target USING df_re source

ON target.DB_ID = source.DB_ID

WHEN MATCHED THEN UPDATE SET *

WHEN NOT MATCHED THEN INSERT *

The data amount is small like 200k rows or even smaller

"node_type_id": "Standard_D16as_v5"

"spark_version": "14.3.x-scala2.12"

Cluster has no sparks configurations-

Unity catalog is in use and delta tables are in external location.

One thing is that the notebook that his merge is run has a lot of dataframes and other data transformations for creating this dataframe that is then create into a TempView.

It is a mystery and have no idea how to solve this, it is not a data issue, that is for sure.

Any tips and help is welcome

SparkOutOfMemoryError when merging data into a table that already has data