SparkOutOfMemoryError when merging data into a table that already has data

vannipart — Wed, 17 Jul 2024 10:16:46 GMT

Hello,

There is an issue with merging data from a dataframe into a table

2024 databricksJob aborted due to stage failure: Task 17 in stage 1770.0 failed 4 times, most recent failure: Lost task 17.3 in stage 1770.0 (TID 1669) (1x.xx.xx.xx executor 8): org.apache.spark.memory.SparkOutOfMemoryError: [UNABLE_TO_ACQUIRE_MEMORY] Unable to acquire 28 bytes of memory, got 0.

There script:

df.createOrReplaceTempView("df_re")

%sql

MERGE INTO catalog.schema.table target USING df_re source

ON target.DB_ID = source.DB_ID

WHEN MATCHED THEN UPDATE SET *

WHEN NOT MATCHED THEN INSERT *

The data amount is small like 200k rows or even smaller

"node_type_id": "Standard_D16as_v5"

"spark_version": "14.3.x-scala2.12"

Cluster has no sparks configurations-

Unity catalog is in use and delta tables are in external location.

One thing is that the notebook that his merge is run has a lot of dataframes and other data transformations for creating this dataframe that is then create into a TempView.

It is a mystery and have no idea how to solve this, it is not a data issue, that is for sure.

Any tips and help is welcome

Re: SparkOutOfMemoryError when merging data into a table that already has data

vannipart — Mon, 12 Aug 2024 05:59:15 GMT

Hello Kaniz_Fatma,

The problem wasn't anything related to listed things up here, it was bad data modelling and how relation inside the table was created. Remodelling data helped

topic Re: SparkOutOfMemoryError when merging data into a table that already has data in Data Engineering

SparkOutOfMemoryError when merging data into a table that already has data

Re: SparkOutOfMemoryError when merging data into a table that already has data